RTree Spatial Index – Why It Doesn’t Speed Up Intersection Computation

fionaintersectionpython-2.7rtreeshapely

I have some code that I am using to determine which Shapely Polygon/MultiPolygons intersect with a number of Shapely LineStrings. Through the answers to this question the code has gone from this:

import fiona
from shapely.geometry import LineString, Polygon, MultiPolygon, shape

# Open each layer
poly_layer = fiona.open('polygon_layer.shp')
line_layer = fiona.open('line_layer.shp')

# Convert to lists of shapely geometries
the_lines = [shape(line['geometry']) for line in line_layer]
the_polygons = [(poly['properties']['GEOID'], shape(poly['geometry'])) for poly in poly_layer]

# Check for Polygons/MultiPolygons that the LineString intersects with
covered_polygons = {}
for poly_id, poly in the_polygons:
    for line in the_lines:
        if poly.intersects(line):
            covered_polygons[poly_id] = covered_polygons.get(poly_id, 0) + 1

where every possible intersection is checked, to this:

import fiona
from shapely.geometry import LineString, Polygon, MultiPolygon, shape
import rtree

# Open each layer
poly_layer = fiona.open('polygon_layer.shp')
line_layer = fiona.open('line_layer.shp')

# Convert to lists of shapely geometries
the_lines = [shape(line['geometry']) for line in line_layer]
the_polygons = [(poly['properties']['GEOID'], shape(poly['geometry'])) for poly in poly_layer]

# Create spatial index
spatial_index = rtree.index.Index()
for idx, poly_tuple in enumerate(the_polygons):
    _, poly = poly_tuple
    spatial_index.insert(idx, poly.bounds)

# Check for Polygons/MultiPolygons that the LineString intersects with
covered_polygons = {}
for line in the_lines:
    for idx in list(spatial_index.intersection(line.bounds)):
        if the_polygons[idx][1].intersects(line):
            covered_polygons[idx] = covered_polygons.get(idx, 0) + 1

where the spatial index is used to reduce the number of intersection checks.

With the shapefiles I have (approximately 4000 polygons, and 4 lines), the original code performs 12936 .intersection() checks and takes about 114 seconds to run. The second piece of code that uses the spatial index performs only 1816 .intersection() checks but it also takes approximately 114 seconds to run.

The code to build the spatial index only takes 1-2 seconds to run, so the 1816 checks in the second piece of code are taking just about the same amount of time to perform as the 12936 checks in the original code (since the loading of shapefiles and converting to Shapely geometries is the same in both pieces of code).

I cannot see any reason that the spatial index would make the .intersects() check take longer, so I am at a loss for why this is happening.

I can only think that I am using the RTree spatial index incorrectly. Thoughts?

Best Answer

My answer is essentially based on another answer by @gene here:

More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc

He proposed the same solution using two differents methods, with or without a spatial index.

He (correctly) stated:

What is the difference ?

  • Without the index, you must iterate through all the geometries (polygons and points).
  • With a bounding spatial index (Spatial Index RTree), you iterate only through the geometries which have a chance to intersect with your current geometry ('filter' which can save a considerable amount of calculations and time...)
  • but a Spatial Index is not a magic wand. When a very large part of the dataset has to be retrieved, a Spatial Index cannot give any speed benefit.

These sentences are self-explanatory, but I preferred quoting @gene instead of proposing the same conclusions as mine (so, all the credits go to its brilliant work!).

For a better understanding of the Rtree spatial index, you may retrieve some useful informations following these links:

Another great introduction about the using of spatial indexes may be this article by @Nathan Woodrow.