Python GeoPandas – How to Reduce Execution Time for Intersection

areageopandasintersectionpython

I got two dataframes in GeoPandas Python. One table contains around 2.2 million rows of data (geometry type is Multipolygon) and other table contains around 3800 rows (geometry type is Multiplygon). I am trying to calculate how many polygons from bigger table are either completely 'within' smaller table's polygon or if they intersect with each other how much area does it overlap with other table's polygon. Following is the code I have written:

import geopandas as gpd
import pandas as pd

with_in = gpd.sjoin(parcels_gdf, coverage_df, how='inner', predicate='within')
with_in['Full_covered'] = 100
remaining_parcels  = parcels_gdf.drop(with_in.index)
intersections = remaining_parcels.intersection(coverage_df.unary_union)
intersection_areas = intersections.area
total_intersection_area = intersection_areas.sum()

parcels_gdf is the the table that contains 2.2 million rows. coverage_df contains 3800 rows. remaining_parcels contains around 1.5 million rows. The issue I have, program is taking very long (more than 12 hours as I write and still running) when it execute intersections = remaining_parcels.intersection(coverage_df.unary_union). I am not sure how long further it takes to compelte the execution. I got a laptop with core i7 with 16 GB. Is there any better way to program it faster?

Best Answer

This: intersections = remaining_parcels.intersection(coverage_df.unary_union) is very slow,

because each polygon in remaining_parcels is intersected with a huge dissolved/unioned multipolygon.

Try this:

import geopandas as gpd

bigdf = gpd.read_file("/path/to/file")
bigdf["bigid"] = range(bigdf.shape[0])
smalldf = gpd.read_file("/path/to/file2")

within = gpd.sjoin(left_df=bigdf, right_df=smalldf, predicate="within")
within["full_coverage"] = 100

#Intersect the polygons in bigdf which are not within, with the smalldf.
inter = gpd.overlay(df1=bigdf.loc[~bigdf.index.isin(within.index)],
            df2=smalldf, how="intersection", keep_geom_type="True")
inter["area"] = inter.geometry.area

inter.groupby("bigid")["area"].sum() #If you want each bigdf's intersected polygons area
inter.area.sum() #Or the total