[GIS] Using spatial index to intersect points with polygon, when points and polygon have same minimum bounding box

polygonrtreeshapelyspatial-index

I have a MultiPolygon representing the city boundary of Houston and it has an extremely complicated boundary. I also have a set of ~900,000 points (that has the same minimum bounding box as that Houston polygon). About ~400,000 of these points are within the polygon but the others lie outside it. Using python, geopandas, and shapely I tried intersecting this polygon with my points using r-tree. But because the points and polygon have the same minimum bounding box, r-tree offers no speed-up. The process currently takes 30+ minutes.

Which (if any) type of spatial index can I use to accelerate my intersection query when the polygon and points have the same minimum bounding box?

Edit to add code snippet here:

sindex = gdf['geometry'].sindex
possible_matches_index = list(sindex.intersection(polygon.bounds))
possible_matches = gdf.iloc[possible_matches_index]
points_in_polygon = possible_matches[possible_matches.intersects(polygon)]

Best Answer

I try to explain with an image. If you do not agree that this strategy works I will delete my answer. However, I will continue to use it in my own work.

I took Texas as an example and I split it into 1 by 1 degree pieces. The grid that shows behind corresponds with the R-Tree index despite at the boundaries where the real R-Tree BBOXes are smaller.

In the Texas case a point never hits more than one R-Tree box. In case of adjacent polygons there may be more hits, but never more than a handful. Next thing to do is to make an Intersects test with a true geometry of the corresponding piece. If point is inside that piece it is inside Texas as well, and in the USA, North America etc.

enter image description here