I have a csv with two sets of latitude and longitude points, and a shapefile with espg 4326. I want to find which points are inside the shapefile. I've been following the guide here which uses a spatial index to speed up the search however I'm running into an error.
The csv looks like:
e6ed541a288f13745d2755a79372c0c4,fc707fee61baa13b40278a70c679b9a7,3fe94a002317b5f9259f82690aeea4cd,1,2016-11-01T02:03:18.000Z,2,2016-11-01T03:01:13.000Z,2,51.456,0.146,51.506,-0.125,false,false,EE,2,3,2,,,,
36e6a9d201be3957fb78be4dfaacb932,d3cbd057ca3bf69f6805d054b4a228c5,52720e003547c70561bf5e03b95aa99f,1,2016-11-01T16:21:45.467Z,2,2016-11-01T16:43:01.627Z,2,51.499,-0.008,51.506,-0.101,false,false,EE,1,1,1,,,,
f15a4077cc79217e620270a24c127ed4,581c63e4232ae5208f3ae23a09fff3ea,13f9896df61279c928f19721878fac41,1,2016-11-01T20:31:36.000Z,2,2016-11-01T21:21:10.000Z,2,51.529,-0.124,51.512,-0.092,false,false,EI,2,2,1,,,,
bcd71b06e56e2e023fa055a40ee03f20,0ed657e42b4d820eed2affd146067623,3fe94a002317b5f9259f82690aeea4cd,1,2016-11-01T22:58:42.000Z,2,2016-11-01T23:27:24.000Z,2,51.503,-0.113,51.505,-0.086,false,false,EE,2,3,2,,,,
My code so far is:
trip = pd.read_csv(r'test\TripRecordsReporttrips.csv', sep=',',header=None, usecols=[0, 4, 8, 9, 10, 11],names=['TripID', 'Date', 'StartLat', 'StartLon','EndLat','EndLon'])
geometry = [Point(xy) for xy in zip(trip['StartLon'], trip['StartLat'])]
geometry2 = [Point(xy) for xy in zip(trip['EndLon'], trip['EndLat'])]
trip = trip.drop(['StartLon', 'StartLat','EndLon','EndLat'], axis=1)
starts = GeoDataFrame(trip, crs=crs, geometry=geometry) #turn dataframe into geodataframe
ends = GeoDataFrame(trip, crs=crs, geometry=geometry2)
FRC1 = geopandas.GeoDataFrame.from_file('FRC1Shapefile.shp') # import FRC1 polygon
spatial_index = starts.sindex
possible_matches_index = list(spatial_index.intersection(FRC1.bounds))
possible_matches = starts.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.intersects(FRC1)]
the error I'm getting is from RTree:
RTreeError: Coordinates must be in the form (minx, miny, maxx, maxy) or (x, y) for 2D indexes
Best Answer
There are easier ways, look at More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc or Check if a point falls within a multipolygon with Python.
1) Use directly a GeoPandas spatial-join with the command
sjoin
which incorporates a spatial index (rtree, line 29 of sjoin.py)2) Now use
sjoin
The point 0 is within polygon 2, the points 2, 3 are within polygon 1, the point 1 is not in a polygon
Eliminate the Nan values: point(s) is/are not in a polygon
3) Conclusion
These 3 points are inside the shapefile