[GIS] Python: Break linestring based on condition

geopandaspythonshapely

I have a geopandas dataframe of a bunch of linestrings that have some data associated with each vertex/point:

Point_x = (Lat, Lon, Time, ID, Data1, Data2, Data3)

The points are converted to linestrings based on ID and ordered by Time.

I want to break the linestrings where at the point where some condition is met. Right now that's when the distance between Points is greater than some value. In the future it could be where a function of the Data fields is some value. For instance, split a linestring when Speed crosses 5 kph.

The current problem is that some of the tracks are formed from points that have duplicate ID's so the linestring jumps back and forth over huge distances and I want a threshold to break these lines.

Any ideas on the correct way to structure this or libraries/methods that might be of use?

The dataframe has over 150k tracks with many points per track in it so efficiency would be nice.

Here's an example of the tracks DF:

ID         geometry                                                  
204235000  LINESTRING (37.62001 -28.99535, 37.62015 -28.9...   
205400000  LINESTRING (3.807816666666666 -18.083181666666...   
207138000  LINESTRING (22.73206 -34.97915833333333, 22.73...   
209016000  LINESTRING (8.447673333333331 -23.522783333333...     

Here's an sample from the points DF. There are 18 columns including Datetime, Point(Lon, Lat), Speed, Size etc etc:

Index           Heading   Latitude  Longitude       ID  
20              92.8 -35.946802  13.089695  210725000               
21              93.5 -35.946912  13.091808  210725000               
22              95.4 -35.965520  13.497698  210725000               
23              94.7 -35.965803  13.501898  210725000               
24              94.9 -35.965987  13.504573  210725000               

EDIT: Tried to be a little clearer.

Best Answer

I haven't used shapely/geopandas yet, so I can only provide pseudocode:

distance_threshold = 50 # Value at which distance to cut off
new_lines = [] # Array to hold the newly created, split lines
new_line_marker = 0 # Let's remember where our new line starts
for linestring in linestrings: # Iterate over all linestrings
  for i, coord in enumerate(linestring.coords[:-1]): # Iterate over all coords of the linestring
    if distance(coord, coords[i+1]) >= distance_threshold: # Check if threshold is met
      # If condition is met, we generate a new linestring,
      # starting from the last split to the current one
      new_lines[] = new LineString(coords[new_line_marker:i])
      new_line_marker = i+1 # remember to reset the marker

The distance function should be something that your libs already offer, or you'll have to implement it yourself (ol' buddy Pythagoras will help you out).

Efficiency can be improved as needed from there, but it should be a good starting point.

Related Question