[GIS] More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc

fionaspatial-jointypeerror

I'm attempting to do a spatial join much like the example here: Is there a python option to "join attributes by location"?. However, that approach seems really inefficient / slow. Even running this with a modest 250 points takes almost 2 minutes and it fails entirely on shapefiles with > 1,000 points. Is there a better approach? I'd like to do this entirely in Python without using ArcGIS, QGIS, etc.

I'd also be interested to know if it's possible to SUM attributes (i.e. population) of all the points that fall within a polygon and join that quantity to the polygon shapefile.

Here is the code I'm trying to convert. I get an error on line 9:

poly['properties']['score'] += point['properties']['score']

which says:

TypeError: unsupported operand type(s) for +=: 'NoneType' and 'float'.

If I replace the "+=" with "=" it runs fine but that doesn't sum the fields. I've also tried making these as integers but that fails as well.

with fiona.open(poly_shp, 'r') as n: 
  with fiona.open(point_shp,'r') as s:
    outSchema = {'geometry': 'Polygon','properties':{'region':'str','score':'float'}}
    with fiona.open (out_shp, 'w', 'ESRI Shapefile', outSchema, crs) as output:
        for point in s:
            for poly in n:
                if shape(point['geometry']).within(shape(poly['geometry'])):  
                    poly['properties']['score']) += point['properties']['score'])
                    output.write({
                        'properties':{
                            'region':poly['properties']['NAME'],
                            'score':poly['properties']['score']},
                        'geometry':poly['geometry']})

Best Answer

Fiona returns Python dictionaries and you can not use poly['properties']['score']) += point['properties']['score']) with a dictionary.

Example of summing attributes using the references given by Mike T:

enter image description here

# read the shapefiles 
import fiona
from shapely.geometry import shape
polygons = [pol for pol in fiona.open('poly.shp')]
points = [pt for pt in fiona.open('point.shp')]
# attributes of the polygons
for poly in polygons:
   print poly['properties'] 
OrderedDict([(u'score', 0)])
OrderedDict([(u'score', 0)])
OrderedDict([(u'score', 0)])

# attributes of the points
for pt in points:
    print i['properties']
 OrderedDict([(u'score', 1)]) 
 .... # (same for the 8 points)

Now, we can use two methods, with or without a spatial index:

1: without

# iterate through points 
 for i, pt in enumerate(points):
     point = shape(pt['geometry'])
     #iterate through polygons
     for j, poly in enumerate(polygons):
        if point.within(shape(poly['geometry'])):
             # sum of attributes values
             polygons[j]['properties']['score'] = polygons[j]['properties']['score'] + points[i]['properties']['score']

2: with a R-tree index (you can use pyrtree or rtree)

# Create the R-tree index and store the features in it (bounding box)
 from rtree import index
 idx = index.Index()
 for pos, poly in enumerate(polygons):
       idx.insert(pos, shape(poly['geometry']).bounds)

#iterate through points
for i,pt in enumerate(points):
  point = shape(pt['geometry'])
  # iterate through spatial index
  for j in idx.intersection(point.coords[0]):
      if point.within(shape(polygons[j]['geometry'])):
            polygons[j]['properties']['score'] = polygons[j]['properties']['score'] + points[i]['properties']['score']

Result with the two solutions:

for poly in polygons:
   print poly['properties']    
 OrderedDict([(u'score', 2)]) # 2 points in the polygon
 OrderedDict([(u'score', 1)]) # 1 point in the polygon
 OrderedDict([(u'score', 1)]) # 1 point in the polygon

What is the difference ?

  • Without the index, you must iterate through all the geometries (polygons and points).
  • With a bounding spatial index (Spatial Index RTree), you iterate only through the geometries which have a chance to intersect with your current geometry ('filter' which can save a considerable amount of calculations and time...)
  • but a Spatial Index is not a magic wand. When a very large part of the dataset has to be retrieved, a Spatial Index cannot give any speed benefit.

After:

schema = fiona.open('poly.shp').schema
with fiona.open ('output.shp', 'w', 'ESRI Shapefile', schema) as output:
    for poly in polygons:
        output.write(poly)

To go further, look at Using Rtree Spatial Indexing With OGR, Shapely, Fiona