RTree Spatial Index – Why It Doesn’t Speed Up Intersection Computation

fionaintersectionpython-2.7rtreeshapely

I have some code that I am using to determine which Shapely Polygon/MultiPolygons intersect with a number of Shapely LineStrings. Through the answers to this question the code has gone from this:

import fiona
from shapely.geometry import LineString, Polygon, MultiPolygon, shape

# Open each layer
poly_layer = fiona.open('polygon_layer.shp')
line_layer = fiona.open('line_layer.shp')

# Convert to lists of shapely geometries
the_lines = [shape(line['geometry']) for line in line_layer]
the_polygons = [(poly['properties']['GEOID'], shape(poly['geometry'])) for poly in poly_layer]

# Check for Polygons/MultiPolygons that the LineString intersects with
covered_polygons = {}
for poly_id, poly in the_polygons:
    for line in the_lines:
        if poly.intersects(line):
            covered_polygons[poly_id] = covered_polygons.get(poly_id, 0) + 1

where every possible intersection is checked, to this:

import fiona
from shapely.geometry import LineString, Polygon, MultiPolygon, shape
import rtree

# Open each layer
poly_layer = fiona.open('polygon_layer.shp')
line_layer = fiona.open('line_layer.shp')

# Convert to lists of shapely geometries
the_lines = [shape(line['geometry']) for line in line_layer]
the_polygons = [(poly['properties']['GEOID'], shape(poly['geometry'])) for poly in poly_layer]

# Create spatial index
spatial_index = rtree.index.Index()
for idx, poly_tuple in enumerate(the_polygons):
    _, poly = poly_tuple
    spatial_index.insert(idx, poly.bounds)

# Check for Polygons/MultiPolygons that the LineString intersects with
covered_polygons = {}
for line in the_lines:
    for idx in list(spatial_index.intersection(line.bounds)):
        if the_polygons[idx][1].intersects(line):
            covered_polygons[idx] = covered_polygons.get(idx, 0) + 1

where the spatial index is used to reduce the number of intersection checks.

With the shapefiles I have (approximately 4000 polygons, and 4 lines), the original code performs 12936 .intersection() checks and takes about 114 seconds to run. The second piece of code that uses the spatial index performs only 1816 .intersection() checks but it also takes approximately 114 seconds to run.

The code to build the spatial index only takes 1-2 seconds to run, so the 1816 checks in the second piece of code are taking just about the same amount of time to perform as the 12936 checks in the original code (since the loading of shapefiles and converting to Shapely geometries is the same in both pieces of code).

I cannot see any reason that the spatial index would make the .intersects() check take longer, so I am at a loss for why this is happening.

I can only think that I am using the RTree spatial index incorrectly. Thoughts?

Best Answer

My answer is essentially based on another answer by @gene here:

More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc

He proposed the same solution using two differents methods, with or without a spatial index.

He (correctly) stated:

What is the difference ?

Without the index, you must iterate through all the geometries (polygons and points).

With a bounding spatial index (Spatial Index RTree), you iterate only through the geometries which have a chance to intersect with your current geometry ('filter' which can save a considerable amount of calculations and time...)

but a Spatial Index is not a magic wand. When a very large part of the dataset has to be retrieved, a Spatial Index cannot give any speed benefit.

These sentences are self-explanatory, but I preferred quoting @gene instead of proposing the same conclusions as mine (so, all the credits go to its brilliant work!).

For a better understanding of the Rtree spatial index, you may retrieve some useful informations following these links:

Another great introduction about the using of spatial indexes may be this article by @Nathan Woodrow.

Related Solutions

Python RTree – Understanding Use of Spatial Indexes with RTree

That's the gist of it. The R-tree allows you to make a very fast first pass and gives you a set of results that will have "false positives" (bounding boxes may intersect when the geometries precisely do not). Then you go over the set of candidates (fetching them from the shapefile by their index) and do a mathematically precise intersection test using, e.g., Shapely. This is the very same strategy that's employed in spatial databases like PostGIS.

[GIS] a OGRGeometryShadow and why can’t I use it in Intersects()-method

In your solution you mix geometries and predicates of ogr and Shapely

With Shapely only (intersection(), intersects())

from shapely.geometry import Point, LineString, mapping
geom1  = LineString([(0, 0), (0, 1)])
geom2  = LineString([(1, 1), (-1,-1)])
intersection = geom1.intersection(geom2)
# geo_interface -> GeoJSON
mapping(intersection)
{'type': 'Point', 'coordinates': (0.0, 0.0)}
# Shapely geometry
pt1 = [Point(pt) for pt in list(geom1.coords)]
point_start_1 = pt1[0]
.....
# Shapely predicate
intersection.intersects(point_start_1)
True

With ogr only (Intersection(), Intersects()):

from osgeo import ogr
geom1 = ogr.CreateGeometryFromWkt('LINESTRING (0 0, 0 1)')
geom2 = ogr.CreateGeometryFromWkt('LINESTRING (1 1, -1 -1)')
# intersection with ogr
intersection_ogr = geom1.Intersection(geom2)
# geo_interface -> GeoJSON
intersection_ogr.ExportToJson()
'{ "type": "Point", "coordinates": [ 0.0, 0.0 ] }'
# ogr geometry
point_start_1 = ogr.Geometry(ogr.wkbPoint)
point_start_1.AddPoint(geom1.GetPoint_2D(0)[0],geom1.GetPoint_2D(0)[1])
....
# ogr predicate
intersection_ogr.Intersects(point_start_1)
True

In your solution, you use Shapely geometries:

point_start_1 = Point(geom_ogr1.GetPoint_2D(0))
....

with a binary predicate of ogr -> error

intersection_ogr.Intersects(point_start_1) 
TypeError: in method 'Geometry_Equal', argument 2 of type 'OGRGeometryShadow *'

Best Answer

Related Solutions

Python RTree – Understanding Use of Spatial Indexes with RTree

[GIS] a OGRGeometryShadow and why can’t I use it in Intersects()-method

Related Question