[GIS] How to find where a line intersects itself

geopandaslinepythonself-intersectionshapely

I am using Python 3.7 with Shapely and GeoPandas.

I have a big line of 181,000 points, and would like to find all the points where the line intersects itself. It does so a lot.
I don't need a new point at the precise intersection, just one of the existing points which is closest.

I have been writing code to loop through the points and find other points close by using.

for i,point in gdf.iterrows():
    gdf[gdf.geometry.intersects(point.buffer(10) == True].index.tolist()

Where gdf is a geopandas GeoDataFrame where each row is a point from the line.
(eg it looks like this:)

   geometry
0  POINT (-47.91000 -15.78000)
1  POINT (-47.92000 -15.78000)

But surely there is a way to do this using existing functions?

My way is very slow and records many duplicates at each intersection, so will require more code to reduce each intersection to one point.

Best Answer

update 2021:

a more elegant way using unary_union and linemerge. you can download the notebook here.

read the file

import geopandas as gpd

# before
gdf = gpd.read_file('selfintersects.geojson')
gdf.plot()

let's check the endpoints

def get_endpoints(gdf):
    from shapely.geometry import Point
    startpoint = gdf.geometry.apply(lambda x: x.coords[0])
    endpoint = gdf.geometry.apply(lambda x: x.coords[-1])

    startpoints = [Point(i) for i in startpoint]
    endpoints = [Point(i) for i in endpoint]

    return startpoints, endpoints

def create_endpoints(startpoints, endpoints):
    geom = []
    for a,b in zip(startpoints, endpoints):
        from shapely.geometry import Point
        geom.append(a)
        geom.append(b)

    endpoints = gpd.GeoDataFrame({'id': range(0, len(geom))}, crs=gdf.crs, geometry=geom)
    return endpoints

startpoints, endpoints = get_endpoints(gdf)
endpoints = create_endpoints(startpoints, endpoints)
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
gdf.plot(ax=ax)
endpoints.plot(ax=ax)

union it to merge all lines into one geometry. Note: unary_union will take time if your data is large!

union_geom = gdf.unary_union
union = gpd.GeoDataFrame({'id':[0]}, crs=gdf.crs, geometry=[gdf.unary_union])
union.plot()

and then explode it!

from shapely.ops import linemerge

lm = gpd.GeoDataFrame({'id':[0]}, crs=gdf.crs, geometry=[linemerge(union_geom)]).explode().reset_index(drop=True)
lm.plot()

let's check the endpoint of the exploded union.

startpoints, endpoints = get_endpoints(lm)
endpoints = create_endpoints(startpoints, endpoints)

# cleansing with snap
from shapely.ops import snap
endpoints['geometry'] = endpoints.geometry.apply(lambda x: snap(x, union_geom, 0.00001))

fig, ax = plt.subplots()
gdf.plot(ax=ax)
endpoints.plot(ax=ax)

filter out the dangles

sjoin = endpoints.sjoin(gdf, how='left')

fig, ax = plt.subplots()
gdf.plot(ax=ax)
sjoin[sjoin['index_right'].isna()].plot(ax=ax)

There you go! now we have the points.

DEPRECATED answer from 2020:

Here's how I did it

slice the first feature
make a unary_union of the rest of the feature
do line intersections using shapely
you'll get one point of intersection.
now repeat for the second, third, fourth, and so on.

here's the example.

suppose a geodataframe (gdf) of 6 lines like this GeoJSON

then, apply this code to the gdf. This is returning the geometry of the intersections

# the points of intersections will be appended here
points=[]
for i in gdf.id:
    print(i)
    # check overlap
    feature = gdf[gdf['id']==i]['geometry'][i]
    overlap_feature = gdf[gdf['id']!=i]['geometry'].unary_union
    intersects = feature.intersection(overlap_feature)
    points.append(intersects)
points

now, make a GeoDataFrame out of the points

intersections = gpd.GeoDataFrame(
    {"id": [n for n,i in enumerate(points)]},
    crs={'init':'epsg:4326'},
    geometry=points
)

here's the plot of the result

import matplotlib.pyplot as plt
fig,ax = plt.subplots()
intersections.plot(color="r", ax=ax,zorder=2)
gdf.plot(ax=ax,zorder=1)

enter image description here

the intersections data frame has Point and MultiPoint geometries. But there's a problem here... the points are intersecting. here's how to delete the overlapping points

from shapely.geometry import Point

# convert the multipoints into points 
intersections['ispoint'] = intersections['geometry'].apply(lambda x: isinstance(x, Point)) #backup
is_point = intersections[intersections.ispoint] #check if it's point
was_multipoint = intersections[~intersections.ispoint].explode().reset_index() # converting the multipoint into points 

# now appending both data frames.
now_point = is_point.append(was_multipoint)
now_point.reset_index(inplace=True)
now_point = now_point[['id','geometry']]
now_point['id'] = now_point.index
# ok, now_point contains all intersections, but the points are still overlapping each other

# delete overlapping points
intersections2 = now_point.copy()
points=[]
n= 0
for i in intersections2.id:
    # check overlap
    feature = intersections2[intersections2['id']==i]['geometry'][i]
    overlap_feature = intersections2[intersections2['id']!=i]['geometry'].unary_union

    # IF the point is intersecting with other points, delete the point!
    if feature.intersects(overlap_feature):
        intersections2.drop(i, inplace=True)
    print(n, feature.intersects(overlap_feature))
    n+=1
intersections2

the result is the same, but the intersection points won't overlap each other. here's the plot, and there are 6 row of dataframe, I checked.

edit: note, using `unary_union` means that if we have a large dataset, this may be RAM consuming.

enter image description here

Related Solutions

[GIS] Geopandas Intersects Speed

For intersections in GeoPandas, it is better to use a spatial-join (see More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc or How to efficiently determine which of thousands of polygons intersect with a linestring [duplicate]

import geopandas as gpd
parcels = gpd.read_file('parcels.shp')
roads = gpd.read_file('roads.shp')
intersections= gpd.sjoin(parcels, roads, how="left", op='intersects')
intersections.head()
    parcel                  geometry                         index_right  road
0  Parcel 1  POLYGON ((-0.6824583866837387 0.78233034571062...  0.0       Road 1  
1  Parcel 2  POLYGON ((-0.09859154929577452 0.3239436619718...  NaN       Nan
2  Parcel 3  POLYGON ((-0.9103713188220229 -0.1062740076824...  1.0       Road 2  
3  Parcel 4  POLYGON ((0.2266325224071704 0.620998719590268...  0.0       Road 1

With your solution

road = roads.unary_union
parcels['road_intersection'] = parcels.intersects(road)
parcels
    parcel                  geometry                         index_left  road_intersection  
0  Parcel 1  POLYGON ((-0.6824583866837387 0.78233034571062...   0         True 
1  Parcel 2  POLYGON ((-0.09859154929577452 0.3239436619718...   1         False
2  Parcel 3  POLYGON ((-0.9103713188220229 -0.1062740076824...   2         True
3  Parcel 4  POLYGON ((0.2266325224071704 0.620998719590268...   3         True

GeoPandas – How to Split Line by Nearest Points

First make sure you union your GeoDataFrames into a MultiLineString and MultiPoint

line = gdf_line.geometry.unary_union
coords = gdf_point.geometry.unary_union

Using shapely.ops.snap and shapely.ops.split it is possible to snap the points to the line (with a given tolerance) and use this to split the line. Result is a GeometryCollection

split(line, snap(coords, line, tolerance=1)

To combine this and return a GeoDataFrame use the following function:

import geopandas as gpd
from shapely.ops import split, snap


def split_line_by_nearest_points(gdf_line, gdf_points, tolerance):
    """
    Split the union of lines with the union of points resulting 
    Parameters
    ----------
    gdf_line : geoDataFrame
        geodataframe with multiple rows of connecting line segments
    gdf_points : geoDataFrame
        geodataframe with multiple rows of single points

    Returns
    -------
    gdf_segments : geoDataFrame
        geodataframe of segments
    """

    # union all geometries
    line = gdf_line.geometry.unary_union
    coords = gdf_points.geometry.unary_union

    # snap and split coords on line
    # returns GeometryCollection
    split_line = split(line, snap(coords, line, tolerance))

    # transform Geometry Collection to GeoDataFrame
    segments = [feature for feature in split_line]

    gdf_segments = gpd.GeoDataFrame(
        list(range(len(segments))), geometry=segments)
    gdf_segments.columns = ['index', 'geometry']

    return gdf_segments

Which can be plotted as follows (where I find the tolerance variable still trial and error):

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
gdf_line.plot(ax=ax, lw=6, color='gray')
gdf_segments.plot(ax=ax, column='index', lw=3, cmap='Paired')
gdf_points.plot(ax=ax, zorder=3)

--EDIT

the snap function is not similar as a nearest_point query. I end up using the function https://github.com/ojdo/python-tools/blob/master/shapelytools.py#L144 from https://github.com/ojdo/python-tools that provides many interesting functions for shapely

Best Answer

Related Solutions

[GIS] Geopandas Intersects Speed

GeoPandas – How to Split Line by Nearest Points

Related Question