GeoPandas Polygon Points – Why GeoPandas Does Not Identify Points Inside Polygon and Solutions

geopandaspythonshapely

I am trying to intersect some points with a polygon in GeoPandas.

First I create a Point GeoDataFrame from a text file.

import geopandas as gp
import numpy as np
import pandas as pd
from shapely.geometry import Point, Polygon

tile_index = pd.read_csv('../tile_index.dat', names=['tile', 'x', 'y'], sep=' ')
geometry = [Point(r.x, r.y) for ix, r in tile_index.iterrows()]
tile_index = gp.GeoDataFrame(tile_index, geometry=geometry)
tile_index = tile_index[(tile_index.x.between(-10, 30)) & (tile_index.y.between(-10, 30))]
print tile_index.head()

Which produces:

     tile      x       y                                        geometry
202   226 -9.226  -0.724  POINT (-9.226000000000001 -0.7240000000000001)
203   227 -9.226   9.276                POINT (-9.226000000000001 9.276)
204   228 -9.226  19.276               POINT (-9.226000000000001 19.276)
205   229 -9.226  29.276               POINT (-9.226000000000001 29.276)
219   243  0.774  -0.724               POINT (0.774 -0.7240000000000001)

Then I create a Polygon to intersect:

poly = Polygon([(0, 0), (0, 20), (20, 20), (20, 0)])
P = gp.GeoDataFrame(data=[1, poly]).T
P.columns = ['ID', 'geometry']
print np.any(tile_index.within(P))

Which returns False but at least 4 points should be within.

enter image description here

If I create a point using the coordinates of point 237 it does intersect.

pt = Point(10.774, 9.276)
p = gp.GeoDataFrame([[1, pt]])
p.columns = ['ID', 'geometry']
print p.within(P)

Which returns True.

If I do P.contains(tile_index.geometry) then all are False but if I do np.any([P.contains(row.geometry)[0] for ix, row in tile_index.iterrows()]) then this returns True.

Best Answer

geopandas aligns frames on the index, and then performs within or contains element-wise. There's a fuller description in this issue, the most relevant part of which is the example excerpted here:

>>> p1 = shapely.geometry.Polygon([(0,0), (1,0), (1,1), (0,1)])

>>> p2 = shapely.geometry.Polygon([(1,1), (2,1), (2,2), (1,2)])  
>>> p1.intersects(p2)
True
>>> s1 = geopandas.GeoSeries([p1])
>>> s2 = geopandas.GeoSeries([p2])
>>> s1.intersects(s2)
0    True
dtype: bool

# giving it another index, s1 and s2 will now first align (and in this case return False
# for each index, as there are no polygons that intersect and have matching index)
>>> s2 = geopandas.GeoSeries([p2], index=[1])
>>> s1.intersects(s2)
0    False
1    False
dtype: bool

This behavior is definitely not evident, and would be great to document more clearly in geopandas. That said, depending on your application, a spatial join might be the operation you're looking for.