Shapely – Fast Intersection of Polygons with a Bounding Polygon

I have a list of Shapely polygons which I need to find the intersection of each with a bounding Shapely polygon.

Not all the polygons in the list cross the boundary of the bounding polygon so I only need to calculate the intersection for those that do but my method is rather slow as I'm currently iterating over each polygon in the list.

My current code is below:

def intersect_polygons(poly, boundary):
    if poly.overlaps(boundary):
        poly = poly.intersection(boundary)
    return poly

# polygons is a list of shapely Polygon objects
# boundary is a shapely Polygon object
new_polygons = [intersect_polygons(p, boundary) for p in polygons]

The limiting step is checking to see if the polygons overlap as the intersection only happens for a small percentage of the polygons. Is there anyway to do this step without checking every polygon to speed this up? I've seen an rtree suggested as a solution for similar problems but I'm not sure whether one could be used here.

Here is what my input looks like before with the shaded hexagon as the bounding polygon I want to intersect all overlapping polygons with:

And here is what I want the output to look like, once the overlapping polygons have been swapped for their intersections with the boundary.

Best Answer

To speed up this kind of queries you have to use a spatial index. Use the STRtree class to build an rtree and exponentially speed up this operation. Here is the documentation with an example:

https://shapely.readthedocs.io/en/stable/manual.html#str-packed-r-tree

EDIT:

After seen the images, the STRtree will not be usefull at all. In this case, all the polygons intersect the bounding polygon so you need a different kind of optimization. I'll give you some ideas on how you could optimize this:

When you know a geometry must be tested against all (or almost all) the geometries in a set, you can 'prepare' this geometry to speed up testing operations. It don't supports 'overlaps' but you could use 'contains' to test the polygons that falls entirely within and 'intersects' for the boundary polygons.
Knowing that the polygons you are using are convex, instead of testing if the polygon is within the boundary polygon, you could test if the centroid is within, and if not test the intersection of the full polygon. If you don't know if they whould be convex, you could also use a representative point.
I'm thinking that, in some cases, the polygons that intersect the boundary could have the centroid within. You could avoid this by using a negative buffer of the boundary polygon.
You have a 'boundary polygon' but you could also have the boundary of this 'boundary polygon', wich is a linestring that only intersects the polygons that cross that line.
Also, you can extract the segments of the boundary polygon and use the STRtree to index it. Then iterate the polygons to see if they intersect any segment.

Maybe the most optimized way I know is to test if the centroid of the polygons are within a prepared negative buffer of the boundary polygon. If the centroid is not within, seems that you don't have polygons on the outside, but im not 100% sure so I'll give you two options:

If there are no polygons on the outside, then you know that if the centroid is not within it will intersect the boundary.
Else, you could test the intersection with the 'line boundary', or if it's not fast enougth, use the STRtree of the segments.

Best Answer

Related Solutions

Spatial Indexing – Intersect Points with Polygon When Having Same Minimum Bounding Box Using Shapely

Python – Calculating Intersection Percentage in Polygons

Related Question