Python GeoPandas – How to Reduce Execution Time for Intersection

areageopandasintersectionpython

I got two dataframes in GeoPandas Python. One table contains around 2.2 million rows of data (geometry type is Multipolygon) and other table contains around 3800 rows (geometry type is Multiplygon). I am trying to calculate how many polygons from bigger table are either completely 'within' smaller table's polygon or if they intersect with each other how much area does it overlap with other table's polygon. Following is the code I have written:

import geopandas as gpd
import pandas as pd

with_in = gpd.sjoin(parcels_gdf, coverage_df, how='inner', predicate='within')
with_in['Full_covered'] = 100
remaining_parcels  = parcels_gdf.drop(with_in.index)
intersections = remaining_parcels.intersection(coverage_df.unary_union)
intersection_areas = intersections.area
total_intersection_area = intersection_areas.sum()

parcels_gdf is the the table that contains 2.2 million rows. coverage_df contains 3800 rows. remaining_parcels contains around 1.5 million rows. The issue I have, program is taking very long (more than 12 hours as I write and still running) when it execute intersections = remaining_parcels.intersection(coverage_df.unary_union). I am not sure how long further it takes to compelte the execution. I got a laptop with core i7 with 16 GB. Is there any better way to program it faster?

Best Answer

This: intersections = remaining_parcels.intersection(coverage_df.unary_union) is very slow,

because each polygon in remaining_parcels is intersected with a huge dissolved/unioned multipolygon.

Try this:

import geopandas as gpd

bigdf = gpd.read_file("/path/to/file")
bigdf["bigid"] = range(bigdf.shape[0])
smalldf = gpd.read_file("/path/to/file2")

within = gpd.sjoin(left_df=bigdf, right_df=smalldf, predicate="within")
within["full_coverage"] = 100

#Intersect the polygons in bigdf which are not within, with the smalldf.
inter = gpd.overlay(df1=bigdf.loc[~bigdf.index.isin(within.index)],
            df2=smalldf, how="intersection", keep_geom_type="True")
inter["area"] = inter.geometry.area

inter.groupby("bigid")["area"].sum() #If you want each bigdf's intersected polygons area
inter.area.sum() #Or the total

Related Solutions

[GIS] Trying to combine nearby points into a polygon

I would not use a conventional point->polygon process because that expects your points to define the boundary of a polygon and it doesn't sound like yours do. It sounds like yours are hotspots that are somehow related.

However, there are lots of ways to create polygons for this sort of situation depending on what is sensitive in your data. Here's a few quick ideas (one or more of which might be appropriate to your use-case):

Buffer your points by your cluster distance and then dissolve the buffers based on the clusterid
Create a convex hull of your points, one per clusterid
Convert the points to a raster with a resolution of half your distance. Perhaps using the hexagonal cell raster available in QGIS would be good, you can the convert the raster to polygons
Interpolate a raster from the points using a finer resolution than the option above, but your interpolation method will need to be chosen based on what is appropriate to your use case.

[GIS] Geopandas Line Polygon Intersection

When comparing geodataframes with geometry operations in Geopandas, the geometries are first matched by index. In the case where there is no matching index (because you only have a single polygon for instance) then the result will be False.

If it were to compare each object in the GeoSeries you would instead need to get back a full rectangular dataframe of boolean values, and this would likely be very inefficient.

If you do want to compare all geometries then you have two options. The first (and probably easiest) is to use the geopandas sjoin method:

gpd.sjoin(line_gdf, poly_gdf, op='intersects')

This returns a new GeoDataFrame with the geometries for each object on the left dataframe repeated for each geometry they intersect in the right, with the index of the object in the right, i.e.:

                        geometry  index_right
0  LINESTRING (0.5 0.5, 0.7 0.7)            0
1  LINESTRING (0.9 0.9, 0.2 0.6)            0

The second method is to us the pandas apply method on the GeoSeries to return the rectangular dataframe:

line_gdf.geometry.apply(lambda g: poly_gdf.intersects(g))

Which in turn returns (with increasing inefficiency as the dataframes grow):

index_right     0
index_left
0            True
1            True

In general, unless you needed the square matrix, my advice would be to stick to the sjoin method.

Best Answer

Related Solutions

[GIS] Trying to combine nearby points into a polygon

[GIS] Geopandas Line Polygon Intersection

Related Question