GeoPandas Spatial Join – How to Spatially Join Only Features by Largest Overlap with Sjoin in GeoPandas

areageopandasoverlapping-featurespythonspatial-join

I am using GeoPandas to join two GeoDataFrames with the parameter how="inner", op="intersects".

Due to nature of the files a larger percentage of the files are duplicates (in regard of the geometry). Therefore I want to ask how to filter out and restrict to a single area, like selecting only output file entries with the highest Intersection over Union, DICE coefficient, Overlap coefficient or similar of the overlapping geometries.

Best Answer

Intersect, sort by area and drop duplicates:

import geopandas as gpd
import psycopg2

con = psycopg2.connect(database="lmv", user="postgres", password="dsfdas",
    host="localhost")

df1 = gpd.GeoDataFrame.from_postgis("select geom, lan_kod from ok_an_riks", con, geom_col='geom' )
df2 = gpd.GeoDataFrame.from_postgis("select geom, ogc_fid from ok_nd_riks", con, geom_col='geom' )

df3 = gpd.overlay(df1, df2, how='intersection')

#Sort by area so largest area is last
df3.sort_values(by=df3.geometry.area, inplace=True)

#Drop duplicates, keep last/largest
df3.drop_duplicates(subset='ogc_fid', keep='last', inplace=True)

Example:

print(df3.loc[df3['ogc_fid']==105][['lan_kod','ogc_fid']])
   lan_kod  ogc_fid
42      04      105

enter image description here