[GIS] Shapely with Rtree versus STRtree

performancertreeshapelyspatial relationship

I have a huge interrogation regarding the performance difference between Rtree and STRtree included in Shapely. Below is a short code where we create randomly 250 000 triangles (defined by 4 vertice) in a plane of 10000 by 10000. We also create 10000 square lines (5 vertice) and for each one, we try to find the triangles each one intersects. we test both Rtree and STRtree idependently. What surprise me is that Rtree is almost 100 times faster to find the intersection (not counting of course the time to load the features into RTree but you do it only one- around 20 seconds). See the result below. One good point both trees are finding the same number of potential intersections.

My question: Is it normal that RTree is more than 100 times faster than STRtree? Did I code something wrong somewhere? Is it the way STRtree is implemented with Python overhead? Something else?

Here's the code:

from shapely.geometry import LineString
from shapely.strtree import STRtree
import random, time
from rtree import index

lst_lines = []
lst_intersects = []

# Create the triangles
for i in range(25000):
    x = random.random() * 10000.
    y = random.random() * 10000.
    coords = [(x,y),(x+5, y+5),(x,y+10),(x,y)]
    lst_lines.append(LineString(coords))

# Create the bounding boxes
for i in range(10000):
    x = random.random() * 10000.
    y = random.random() * 10000.
    coords = [(x,y),(x+15,y),(x+15,y+15),(x,y+15),(x,y)]
    lst_intersects.append(LineString(coords))

# Create shapely STRtree
tree = STRtree(lst_lines)

# Create RTree
idx = index.Index()
for i, line in enumerate(lst_lines):
    idx.insert(i, line.bounds)
print (time.time())

sec1 = time.time()

# Find the intersection with STRtree
str_tree_nbr = 0
for intersect in lst_intersects:
    str_tree = tree.query(intersect)
    str_tree_nbr += len(str_tree)

sec2 = time.time()
print("Seconds since epoch =", sec2-sec1)
print ("STRtree number: ", str_tree_nbr)

# Find the intersections with RTree
rtree_nbr = 0
for intersect in lst_intersects:
    rtree = idx.intersection(intersect.bounds)
    rtree_nbr += len(list(rtree))

sec3 = time.time()
print("Seconds since epoch =", sec3-sec2)
print ("Rtree number: ", rtree_nbr)

Results:

    Seconds for STRtree = 120.86289858818054
    STRtree number:  12622
    Seconds for RTree = 0.8911409378051758
    Rtree number:  12622

The result for the intersection time is also not linear it grows exponentially when I add more triangles.

System used: Windows 10, conda: 4.7.12, Shapely: 1.7.0 and Rtree: 0.9.4

Best Answer

The poor performance was the result of a bug in Shapely. A fix has been committed and will be available in the next version.