Python Shapefiles – How to Intersect Using Python or Command Line

fionapythonshapely

I'm new to working with geo data and trying to address a similar question to the one posted as How to calculate the size of a particular area below a buffer in QGIS

Using command line and/or shapely/fiona, in the most efficient and elegant way.

I created two shapefiles with their respective numerical attributes:

  1. small areas with data on population
  2. police districts with crime stats(single number)

They are independent.

I'd like to recalculate those such that both estimates are present in each of the shapefiles, eg. estimate of crime within small area (take data from 2 and look for all areas in 1 that are contained in it, cut those that intersect and do maths).

I have the maths figured out and stuck looking for elegant ways and efficient data formats to do intersections and cut out the overlaps to appropriate boundaries.

Update following the response of @gene

The adapted code does not finding any intersections in the data, which is structured as follows (and surprising since it's from official sources):

  1. polSHP_upd: police precincts across SA
  2. sal_subSHP: subset of small regions with population data in Cape Town

The files live here:

https://github.com/AnnaMag/GIS-analyses

What am I missing?

polSHP_upd = 'crime/GIS-analyses/polPres_updated4crimes/polPres_updated4crimes.shp'
sal_subSHP = 'crime/GIS-analyses/sal_sub/sal_sub.shp'

import geopandas as gpd
g1 = gpd.GeoDataFrame.from_file(polSHP_upd)
g2 = gpd.GeoDataFrame.from_file(sal_subSHP)
data = []
# A: region of the police data with criminal record
# C: small area with population data
# we look for all small areas intersecting a given C_i, calculate the  fraction of inclusion, scale the
# population accordingly: area(A_j, where A_j crosses C_i)/area(A_j)*  popul(A_j)
for index, crim in g1.iterrows():
   for index2, popu in g2.iterrows():      
      if popu['geometry'].crosses(crim['geometry']): # objects overlaps to partial extent, not contained
         area_int = popu['geometry'].intersection(crim['geometry']).area
         area_crim = crim['geometry'].area
         area_popu = popu['geometry'].area # there is a Shape_Area field in properties: to check precision
         popu_count = popu['properties']['PPL_CNT']
         popu_frac = (area_int / area_popu) * popu_count# fraction of the pop area contained inside the crim
         data.append({'id': (index1, index2) ,'area_crim': area_crim,'area_pop': area_popu, 'area_inter': area_int, 'popu_frac': popu_frac} )
         # data.append({'geometry': crim['geometry'].intersection(popu['geometry']), 'crime_stat':crim['crime_stat'], 'Population': popu['Population'], 'area':crim['geometry'].intersection(popu['geometry']).area})

      elif popu['geometry'].intersects(crim['geometry']): 
        pass
        #print("intersect")

      elif popu['geometry'].disjoint(crim['geometry']): 
        pass`

Best Answer

The question is about Shapely and Fiona in pure Python without QGIS ("using command line and/or shapely/fiona").

A solution is

from shapely import shape, mapping
import fiona
# schema of the new shapefile
schema =  {'geometry': 'Polygon','properties': {'area': 'float:13.3','id_populat': 'int','id_crime': 'int'}}
# creation of the new shapefile with the intersection
with fiona.open('intersection.shp', 'w',driver='ESRI Shapefile', schema=schema) as output:
    for crim in fiona.open('crime_stat.shp'):
        for popu in fiona.open('population.shp'):
           if shape(crim['geometry']).intersects(shape(popu['geometry'])):     
              area = shape(crim['geometry']).intersection(shape(popu['geometry'])).area
              prop = {'area': area, 'id_populat' : popu['id'],'id_crime': crim['id']} 
              output.write({'geometry':mapping(shape(crim['geometry']).intersection(shape(popu['geometry']))),'properties': prop})

The original two layers and the resulting layer

enter image description hereenter image description here

Part of the resulting layer table

enter image description here

You can use a spatial index (rtree here, look at GSE: Fastest way to join many points to many polygons in python and Using Rtree spatial indexing with OGR)

Another solution is to use GeoPandas (= Pandas + Fiona + Shapely)

import geopandas as gpd
g1 = gpd.GeoDataFrame.from_file("crime_stat.shp")
g2 = gpd.GeoDataFrame.from_file("population.shp")
data = []
for index, crim in g1.iterrows():
    for index2, popu in g2.iterrows():
       if crim['geometry'].intersects(popu['geometry']):
          data.append({'geometry': crim['geometry'].intersection(popu['geometry']), 'crime_stat':crim['crime_stat'], 'Population': popu['Population'], 'area':crim['geometry'].intersection(popu['geometry']).area})

df = gpd.GeoDataFrame(data,columns=['geometry', 'crime_stat', 'Population','area'])
df.to_file('intersection.shp')
# control of the results in mi case, first values
df.head() # image from a Jupiter/IPython notebook

enter image description here

Update

You need to understand the definition of the spatial predicates. I use here the JTS Topology suite

enter image description here

As you can see there are only intersections and no crosses nor disjoint here. Some definitions from the Shapely manual

object.crosses(other): Returns True if the interior of the object intersects the interior of the other but does not contain it, and the dimension of the intersection is less than the dimension of the one or the other.
object.disjoint(other): Returns True if the boundary and interior of the object do not intersect at all with those of the other.
object.intersects(other): Returns True if the boundary and interior of the object intersect in any way with those of the other.

You can control it by a simple script (there are other solution but this one is the simplest)

i = 0
for index, crim in g1.iterrows():
   for index2, popu in g2.iterrows():    
       if popu['geometry'].crosses(crim['geometry']):
           i= i+1 
print i

and the result is 0

Therefore, you only need intersects here.

Your script becomes

data = []
for index1, crim in g1.iterrows():
    for index2, popu in g2.iterrows():      
        if popu['geometry'].intersects(crim['geometry']): # objects overlaps to partial extent, not contained
            area_int = popu['geometry'].intersection(crim['geometry']).area
            area_crim = crim['geometry'].area
            area_popu = popu['geometry'].area # 
            # popu['properties'] is for Fiona, not for Pandas
            popu_count = popu['PPL_CNT']
            popu_frac = (area_int / area_popu) * popu_count#
            # you must include the geometry, if not, it is a simple Pandas DataFrame and not a GeoDataframe
            # Fiona does not accept a tuple as value of a field 'id': (index1, index2)
            data.append({'geometry': crim['geometry'].intersection(popu['geometry']), 'id1': index1, 'id2':index2 ,'area_crim': area_crim,'area_pop': area_popu, 'area_inter': area_int, 'popu_frac': popu_frac} )

 df = gpd.GeoDataFrame(data,columns=['geometry', 'id1','id2','area_crim', 'area_pop','area_inter'])
 df.to_file('intersection.shp')
 df.head()

enter image description here

Result:

enter image description here

Related Question