I have a list of Shapely polygons which I need to find the intersection of each with a bounding Shapely polygon.
Not all the polygons in the list cross the boundary of the bounding polygon so I only need to calculate the intersection for those that do but my method is rather slow as I'm currently iterating over each polygon in the list.
My current code is below:
def intersect_polygons(poly, boundary):
if poly.overlaps(boundary):
poly = poly.intersection(boundary)
return poly
# polygons is a list of shapely Polygon objects
# boundary is a shapely Polygon object
new_polygons = [intersect_polygons(p, boundary) for p in polygons]
The limiting step is checking to see if the polygons overlap as the intersection only happens for a small percentage of the polygons. Is there anyway to do this step without checking every polygon to speed this up? I've seen an rtree suggested as a solution for similar problems but I'm not sure whether one could be used here.
Here is what my input looks like before with the shaded hexagon as the bounding polygon I want to intersect all overlapping polygons with:
And here is what I want the output to look like, once the overlapping polygons have been swapped for their intersections with the boundary.
Best Answer
To speed up this kind of queries you have to use a spatial index. Use the STRtree class to build an rtree and exponentially speed up this operation. Here is the documentation with an example:
https://shapely.readthedocs.io/en/stable/manual.html#str-packed-r-tree
EDIT:
After seen the images, the STRtree will not be usefull at all. In this case, all the polygons intersect the bounding polygon so you need a different kind of optimization. I'll give you some ideas on how you could optimize this:
Maybe the most optimized way I know is to test if the centroid of the polygons are within a prepared negative buffer of the boundary polygon. If the centroid is not within, seems that you don't have polygons on the outside, but im not 100% sure so I'll give you two options: