I'm trying to find where multiple lines intersect a polygon for two different geodataframes:
from shapely.geometry import Polygon, LineString
import geopandas as gpd
polygon = Polygon([(0, 0), (1, 0), (1, 1), (0, 1), (0, 0)])
line1 = LineString([(0.5, 0.5), (0.7, 0.7)])
line2 = LineString([(0.9, 0.9), (0.2, 0.6)])
poly_gdf = gpd.GeoDataFrame(geometry=[polygon])
line_gdf = gpd.GeoDataFrame(geometry=[line1, line2])
This is what the above geodataframes look like (one has a polygon and the other has two lines). It looks to me as if both lines intersect the polygon:
However, the intersect output is very confusing:
print(line_gdf.intersects(poly_gdf))
0 True
1 False
print(line1.intersects(polygon))
print(line2.intersects(polygon))
True
True
Why does the geopandas
intersect
method give a different output to the standard shapely
one?
I am using Python 3.5.3 and Geopandas 0.2.1 all on Anaconda.
Best Answer
When comparing geodataframes with geometry operations in Geopandas, the geometries are first matched by index. In the case where there is no matching index (because you only have a single polygon for instance) then the result will be
False
.If it were to compare each object in the
GeoSeries
you would instead need to get back a full rectangular dataframe of boolean values, and this would likely be very inefficient.If you do want to compare all geometries then you have two options. The first (and probably easiest) is to use the geopandas
sjoin
method:This returns a new
GeoDataFrame
with the geometries for each object on the left dataframe repeated for each geometry they intersect in the right, with the index of the object in the right, i.e.:The second method is to us the pandas
apply
method on theGeoSeries
to return the rectangular dataframe:Which in turn returns (with increasing inefficiency as the dataframes grow):
In general, unless you needed the square matrix, my advice would be to stick to the
sjoin
method.