[GIS] Pandas dataframe to Shapely LineString using GroupBy & SortBy

linestringpandaspolyline-creationpythonshapely

I have a pandas dataframe that contains information to construct (poly)lines, and I want to use shapely & geopandas tools to make a SHP.

In the example below, I have 3 lines differentiated by "myid" and the order of the vertices is in "myorder."

Making shapefile from Pandas dataframe? is a great explanation for making a point shapefile, but I am looking for a polyline SHP. Creating Shapely LineString from two Points let's me know I need to use from shapely.geometry import LineString to make the polylines, but I don't understand from the answer there (nor the shapely documentation) how to specify groupby("myid") and sortby("myorder").

How would I do this?

Using Windows 10, Python 3.7.6, Conda 4.6.14.

myid = [1, 1, 1, 2, 2, 3, 3]
myorder = [1, 2, 3, 1, 2, 1, 2]
lat = [36.42, 36.4, 36.4, 36.49, 36.48, 36.39, 36.39]
long = [-118.11, -118.12, -118.11, -118.09, -118.09, -118.10, -118.11]
df = pd.DataFrame(list(zip(myid, myorder, lat, long)), columns =['myid', 'myorder', 'lat', 'long']) 
display(df)

enter image description here

Best Answer

You can do this with geopandas by building a geodataframe, then sorting and grouping and applying a lambda to build the lines.

import pandas as pd
import geopandas as gpd
from shapely.geometry import LineString

myid = [1, 1, 1, 2, 2, 3, 3]
myorder = [1, 2, 3, 1, 2, 1, 2]
lat = [36.42, 36.4, 36.4, 36.49, 36.48, 36.39, 36.39]
long = [-118.11, -118.12, -118.11, -118.09, -118.09, -118.10, -118.11]
df = pd.DataFrame(list(zip(myid, myorder, lat, long)), columns =['myid', 'myorder', 'lat', 'long']) 

# Convert to GeoDataFrame
gdf = gpd.GeoDataFrame(
    df, geometry=gpd.points_from_xy(df['long'], df['lat']))

display(gdf)

# Sort and group points to make lines
line_gdf = gdf.sort_values(by=['myorder']).groupby(['myid'])['geometry'].apply(lambda x: LineString(x.tolist()))
line_gdf = gpd.GeoDataFrame(line_gdf, geometry='geometry')

display(line_gdf)

# Write out
line_gdf.to_file("lines.shp")

enter image description here