[GIS] Python rtree and parallelized code

parallel processingpythonrtree

I'm trying to parallelize with multiprocessing.Pool() some processes that involve queries of R-trees. In the non-parallel procedure, it works as expected.

But when I try to do it in parallel (even with an example of running only 1 process in 1 core) the same query returns empty results.

I'm really puzzled about this, it goes beyond any logic for my understanding.

EDIT

Here is the simplest example that I could reproduce:

from multiprocess.pool import Pool
from rtree import index
class Test(object):
    def __init__(self):
        self.idx = index.Index()
        left, bottom, right, top = (0.0, 0.0, 10.0, 10.0)
        self.idx.insert(0, (left, bottom, right, top))

    def test(self, a):
        print(list(self.idx.intersection((1.0, 1.0, 2.0, 2.0))))

b = Test()
p = Pool()
res = p.map(b.test, range(3))
b.test(1)

output:

[]
[]
[]
[0]

Best Answer

The problem is that rtree is a ctypes wrapper around libspatialindex, and pickle has no way of storing the pointer to the rtree in memory.

You could instead store the index to disk:

from multiprocess.pool import Pool
from rtree import index

class Test(object):
    def __init__(self):
        pass

    def test(self, a):
        self.idx = index.Index('./rtree.idx')
        print(list(self.idx.intersection((1.0, 1.0, 2.0, 2.0))),)

idx = index.Index('./rtree.idx')
left, bottom, right, top = (0.0, 0.0, 10.0, 10.0)
idx.insert(0, (left, bottom, right, top))
idx.close()

b = Test()
p = Pool()
res = p.map(b.test, range(10))
Related Question