Python Shapefiles – How to Intersect Using Python or Command Line

fionapythonshapely

I'm new to working with geo data and trying to address a similar question to the one posted as How to calculate the size of a particular area below a buffer in QGIS

Using command line and/or shapely/fiona, in the most efficient and elegant way.

I created two shapefiles with their respective numerical attributes:

small areas with data on population
police districts with crime stats(single number)

They are independent.

I'd like to recalculate those such that both estimates are present in each of the shapefiles, eg. estimate of crime within small area (take data from 2 and look for all areas in 1 that are contained in it, cut those that intersect and do maths).

I have the maths figured out and stuck looking for elegant ways and efficient data formats to do intersections and cut out the overlaps to appropriate boundaries.

Update following the response of @gene

The adapted code does not finding any intersections in the data, which is structured as follows (and surprising since it's from official sources):

polSHP_upd: police precincts across SA
sal_subSHP: subset of small regions with population data in Cape Town

The files live here:

https://github.com/AnnaMag/GIS-analyses

What am I missing?

polSHP_upd = 'crime/GIS-analyses/polPres_updated4crimes/polPres_updated4crimes.shp'
sal_subSHP = 'crime/GIS-analyses/sal_sub/sal_sub.shp'

import geopandas as gpd
g1 = gpd.GeoDataFrame.from_file(polSHP_upd)
g2 = gpd.GeoDataFrame.from_file(sal_subSHP)
data = []
# A: region of the police data with criminal record
# C: small area with population data
# we look for all small areas intersecting a given C_i, calculate the  fraction of inclusion, scale the
# population accordingly: area(A_j, where A_j crosses C_i)/area(A_j)*  popul(A_j)
for index, crim in g1.iterrows():
   for index2, popu in g2.iterrows():      
      if popu['geometry'].crosses(crim['geometry']): # objects overlaps to partial extent, not contained
         area_int = popu['geometry'].intersection(crim['geometry']).area
         area_crim = crim['geometry'].area
         area_popu = popu['geometry'].area # there is a Shape_Area field in properties: to check precision
         popu_count = popu['properties']['PPL_CNT']
         popu_frac = (area_int / area_popu) * popu_count# fraction of the pop area contained inside the crim
         data.append({'id': (index1, index2) ,'area_crim': area_crim,'area_pop': area_popu, 'area_inter': area_int, 'popu_frac': popu_frac} )
         # data.append({'geometry': crim['geometry'].intersection(popu['geometry']), 'crime_stat':crim['crime_stat'], 'Population': popu['Population'], 'area':crim['geometry'].intersection(popu['geometry']).area})

      elif popu['geometry'].intersects(crim['geometry']): 
        pass
        #print("intersect")

      elif popu['geometry'].disjoint(crim['geometry']): 
        pass`

Best Answer

The question is about Shapely and Fiona in pure Python without QGIS ("using command line and/or shapely/fiona").

A solution is

from shapely import shape, mapping
import fiona
# schema of the new shapefile
schema =  {'geometry': 'Polygon','properties': {'area': 'float:13.3','id_populat': 'int','id_crime': 'int'}}
# creation of the new shapefile with the intersection
with fiona.open('intersection.shp', 'w',driver='ESRI Shapefile', schema=schema) as output:
    for crim in fiona.open('crime_stat.shp'):
        for popu in fiona.open('population.shp'):
           if shape(crim['geometry']).intersects(shape(popu['geometry'])):     
              area = shape(crim['geometry']).intersection(shape(popu['geometry'])).area
              prop = {'area': area, 'id_populat' : popu['id'],'id_crime': crim['id']} 
              output.write({'geometry':mapping(shape(crim['geometry']).intersection(shape(popu['geometry']))),'properties': prop})

The original two layers and the resulting layer

Part of the resulting layer table

You can use a spatial index (rtree here, look at GSE: Fastest way to join many points to many polygons in python and Using Rtree spatial indexing with OGR)

Another solution is to use GeoPandas (= Pandas + Fiona + Shapely)

import geopandas as gpd
g1 = gpd.GeoDataFrame.from_file("crime_stat.shp")
g2 = gpd.GeoDataFrame.from_file("population.shp")
data = []
for index, crim in g1.iterrows():
    for index2, popu in g2.iterrows():
       if crim['geometry'].intersects(popu['geometry']):
          data.append({'geometry': crim['geometry'].intersection(popu['geometry']), 'crime_stat':crim['crime_stat'], 'Population': popu['Population'], 'area':crim['geometry'].intersection(popu['geometry']).area})

df = gpd.GeoDataFrame(data,columns=['geometry', 'crime_stat', 'Population','area'])
df.to_file('intersection.shp')
# control of the results in mi case, first values
df.head() # image from a Jupiter/IPython notebook

Update

You need to understand the definition of the spatial predicates. I use here the JTS Topology suite

As you can see there are only intersections and no crosses nor disjoint here. Some definitions from the Shapely manual

object.crosses(other): Returns True if the interior of the object intersects the interior of the other but does not contain it, and the dimension of the intersection is less than the dimension of the one or the other.
object.disjoint(other): Returns True if the boundary and interior of the object do not intersect at all with those of the other.
object.intersects(other): Returns True if the boundary and interior of the object intersect in any way with those of the other.

You can control it by a simple script (there are other solution but this one is the simplest)

i = 0
for index, crim in g1.iterrows():
   for index2, popu in g2.iterrows():    
       if popu['geometry'].crosses(crim['geometry']):
           i= i+1 
print i

and the result is 0

Therefore, you only need intersects here.

Your script becomes

data = []
for index1, crim in g1.iterrows():
    for index2, popu in g2.iterrows():      
        if popu['geometry'].intersects(crim['geometry']): # objects overlaps to partial extent, not contained
            area_int = popu['geometry'].intersection(crim['geometry']).area
            area_crim = crim['geometry'].area
            area_popu = popu['geometry'].area # 
            # popu['properties'] is for Fiona, not for Pandas
            popu_count = popu['PPL_CNT']
            popu_frac = (area_int / area_popu) * popu_count#
            # you must include the geometry, if not, it is a simple Pandas DataFrame and not a GeoDataframe
            # Fiona does not accept a tuple as value of a field 'id': (index1, index2)
            data.append({'geometry': crim['geometry'].intersection(popu['geometry']), 'id1': index1, 'id2':index2 ,'area_crim': area_crim,'area_pop': area_popu, 'area_inter': area_int, 'popu_frac': popu_frac} )

 df = gpd.GeoDataFrame(data,columns=['geometry', 'id1','id2','area_crim', 'area_pop','area_inter'])
 df.to_file('intersection.shp')
 df.head()

Result:

Related Solutions

Python – Problem Intersecting Shapefiles with OGR

I do not understand why beginners try to start with the GDAL/OGR Python bindings (not very "Pythonic" and difficult) when other easier alternatives are available.

With your script, you need to know osgeo.ogr and the SQL dialect of SQLite. The solution proposed by Mike T is powerful but not "classic" and performs only the intersection of shapefiles.

What you are trying to do is a Spatial Join (point in Polygon) and not a simple intersection and you can use :

pyshp (shapefile) (pure Python module) as in Point in Polygon or Point in Polygon 2: Walking the line
Fiona (other Python binding of OGR with Python dictionaries) and Shapely as in More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc (1) and many other examples in GIS SE
GeoPandas as in More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc (2)
If you want to use OGR, consult the Python GDAL/OGR Cookbook and many examples in GIS SE

GeoPandas – Intersecting Two Shape Problem Using GeoPandas

1) The problem is that the intersection of two polygons is not always a polygon

import geopandas as gp
poly1 = gp.read_file("poly_origin.shp")
poly2 = gp.read_file("poly_test.shp")
data = []
for index, orig in poly1.iterrows():
    for index2, ref in poly2.iterrows():      
        if ref['geometry'].intersects(orig['geometry']): 
         owdspd=orig['id']
         data.append({'geometry':ref['geometry'].intersection(orig['geometry']),'wdspd':owdspd})

for geom in data: 
   print geom
{'geometry': <shapely.geometry.polygon.Polygon object at 0x10aab9c10>, 'wdspd': 1}
{'geometry': <shapely.geometry.polygon.Polygon object at 0x10aab9e50>, 'wdspd': 2}
{'geometry': <shapely.geometry.linestring.LineString object at 0x10aacc250>, 'wdspd': 3}

The result is 2 Polygons and a LineString therefore the error

"ValueError: Geometry column cannot contain mutiple geometry types when writing to file"

2) A solution is to convert all the geometries to the same type

from shapely.geometry import LineString
for i in data:
 geom = i['geometry']
 if geom.geom_type=='Polygon':
      i['geometry']=  LineString(list(geom.exterior.coords))

Now, you can convert to a GeoDataFrame

df = gp.GeoDataFrame(data,columns=['geometry','wdspd'])
df                          geometry                   wdspd
0  LINESTRING (96.42123294751703 -44.632201588700...      1
1  LINESTRING (282.773470482277 -127.548992317643...      2
2  LINESTRING (144.3726591760299 -221.86142322097...      3

3) Or to use GeoPandas Overlay

inter = gp.overlay(poly1, poly2, how='intersection')
inter
     id  id_2                geometry
 0   2     2  POLYGON ((282.773470482277 -127.5489923176438,...
 1   1     1  POLYGON ((51.0880537778328 -116.6195085418168,...

Export the result

df.to_file("linestring_intersection.shp")