[GIS] Find csv lat and long points in a shapefile polygon with geopandas spatial index

geopandaspythonrtreespatial-index

I have a csv with two sets of latitude and longitude points, and a shapefile with espg 4326. I want to find which points are inside the shapefile. I've been following the guide here which uses a spatial index to speed up the search however I'm running into an error.

The csv looks like:

e6ed541a288f13745d2755a79372c0c4,fc707fee61baa13b40278a70c679b9a7,3fe94a002317b5f9259f82690aeea4cd,1,2016-11-01T02:03:18.000Z,2,2016-11-01T03:01:13.000Z,2,51.456,0.146,51.506,-0.125,false,false,EE,2,3,2,,,,
36e6a9d201be3957fb78be4dfaacb932,d3cbd057ca3bf69f6805d054b4a228c5,52720e003547c70561bf5e03b95aa99f,1,2016-11-01T16:21:45.467Z,2,2016-11-01T16:43:01.627Z,2,51.499,-0.008,51.506,-0.101,false,false,EE,1,1,1,,,,
f15a4077cc79217e620270a24c127ed4,581c63e4232ae5208f3ae23a09fff3ea,13f9896df61279c928f19721878fac41,1,2016-11-01T20:31:36.000Z,2,2016-11-01T21:21:10.000Z,2,51.529,-0.124,51.512,-0.092,false,false,EI,2,2,1,,,,
bcd71b06e56e2e023fa055a40ee03f20,0ed657e42b4d820eed2affd146067623,3fe94a002317b5f9259f82690aeea4cd,1,2016-11-01T22:58:42.000Z,2,2016-11-01T23:27:24.000Z,2,51.503,-0.113,51.505,-0.086,false,false,EE,2,3,2,,,,

My code so far is:

trip = pd.read_csv(r'test\TripRecordsReporttrips.csv', sep=',',header=None, usecols=[0, 4, 8, 9, 10, 11],names=['TripID', 'Date', 'StartLat', 'StartLon','EndLat','EndLon'])
geometry = [Point(xy) for xy in zip(trip['StartLon'], trip['StartLat'])]
geometry2 = [Point(xy) for xy in zip(trip['EndLon'], trip['EndLat'])]
trip = trip.drop(['StartLon', 'StartLat','EndLon','EndLat'], axis=1)
starts = GeoDataFrame(trip, crs=crs, geometry=geometry) #turn dataframe into geodataframe
ends = GeoDataFrame(trip, crs=crs, geometry=geometry2)
FRC1  = geopandas.GeoDataFrame.from_file('FRC1Shapefile.shp') # import FRC1 polygon

spatial_index = starts.sindex
possible_matches_index = list(spatial_index.intersection(FRC1.bounds))
possible_matches = starts.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.intersects(FRC1)]

the error I'm getting is from RTree:

RTreeError: Coordinates must be in the form (minx, miny, maxx, maxy) or (x, y) for 2D indexes

EDIT:
Here's an image of the points sjoin is selecting.
enter image description here

Best Answer

There are easier ways, look at More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc or Check if a point falls within a multipolygon with Python.

1) Use directly a GeoPandas spatial-join with the command sjoin which incorporates a spatial index (rtree, line 29 of sjoin.py)

import pandas as pd
trip = pd.read_csv('TripRecordsReporttrips.csv', sep=',',header=None, usecols=[0, 4, 8, 9, 10, 11],names=['TripID', 'Date', 'StartLat', 'StartLon','EndLat','EndLon'])
from shapely.geometry import Point
geometry = [Point(xy) for xy in zip(trip['StartLon'], trip['StartLat'])]
geometry2 = [Point(xy) for xy in zip(trip['EndLon'], trip['EndLat'])]
trip = trip.drop(['StartLon', 'StartLat','EndLon','EndLat'], axis=1)
crs = {'init' :'epsg:4326'}
import geopandas as gp
starts = gp.GeoDataFrame(trip, crs=crs, geometry=geometry)
ends  = gp.GeoDataFrame(trip, crs=crs, geometry=geometry2)
starts.head()

enter image description here

 FRC1  = gp.read_file('polygons.shp')

enter image description here

2) Now use sjoin

from geopandas.tools import sjoin
pointInPolys = sjoin(starts, FRC1, how='left',op="within")

enter image description here

The point 0 is within polygon 2, the points 2, 3 are within polygon 1, the point 1 is not in a polygon

Eliminate the Nan values: point(s) is/are not in a polygon

pointInPolys = pointInPolys.dropna()
pointInPolys 

enter image description here

3) Conclusion

These 3 points are inside the shapefile