GeoPandas Spatial Join – How to Spatially Join Only Features by Largest Overlap with Sjoin in GeoPandas

areageopandasoverlapping-featurespythonspatial-join

I am using GeoPandas to join two GeoDataFrames with the parameter how="inner", op="intersects".

Due to nature of the files a larger percentage of the files are duplicates (in regard of the geometry). Therefore I want to ask how to filter out and restrict to a single area, like selecting only output file entries with the highest Intersection over Union, DICE coefficient, Overlap coefficient or similar of the overlapping geometries.

Best Answer

Intersect, sort by area and drop duplicates:

import geopandas as gpd
import psycopg2

con = psycopg2.connect(database="lmv", user="postgres", password="dsfdas",
    host="localhost")

df1 = gpd.GeoDataFrame.from_postgis("select geom, lan_kod from ok_an_riks", con, geom_col='geom' )
df2 = gpd.GeoDataFrame.from_postgis("select geom, ogc_fid from ok_nd_riks", con, geom_col='geom' )

df3 = gpd.overlay(df1, df2, how='intersection')

#Sort by area so largest area is last
df3.sort_values(by=df3.geometry.area, inplace=True)

#Drop duplicates, keep last/largest
df3.drop_duplicates(subset='ogc_fid', keep='last', inplace=True)

Example:

print(df3.loc[df3['ogc_fid']==105][['lan_kod','ogc_fid']])
   lan_kod  ogc_fid
42      04      105

Related Solutions

GeoPandas – Merging Overlapping Features

The GeoDataFrame

import geopandas as gpd
g1 = gpd.GeoDataFrame.from_file("poly_intersect.shp")
g1.shape
(4, 3)

1) You can use the itertools module

a) If you want to merge the intersections of the overlapping polygons

import itertools
geoms = g1['geometry'].tolist()
intersection_iter = gpd.GeoDataFrame(gpd.GeoSeries([poly[0].intersection(poly[1]) for poly in  itertools.combinations(geoms, 2) if poly[0].intersects(poly[1])]), columns=['geometry'])
intersection_iter.to_file("intersection_iter.shp")

Union

union_iter = intersection_iter.unary_union

b) If you want to merge the intersected polygons change intersection by union (all the polygons overlap in my example)

2) You can use GeoPandas Overlay

auto_inter = gpd.overlay(g1, g1, how='intersection')
auto_inter.shape
(7,4)

The resulting GeoDataframe

GeoPandas add the intersection geometries to the existing geometries, therefore

intersection = auto_inter[4:7]
intersection.to_file("intersection.shp")

Union

union = intersection.unary_union

b) use gpd.overlay(g1, g1, how='union')

[GIS] Performing sjoin on polygons and lines without intersection using GeoPandas

The geopandas.sjoin function only supports the 'intersects', 'within' and 'contains' predicates, and not a "nearest" one.

You can write a custom function to find the id of the nearest linestring for each polygon, and then merge on that. This could look like:

def nearest_linestring(polygon, df_lines):
    idx = df_lines.geometry.distance(polygon).idxmin()
    return df_lines.loc[idx, 'id']

df_polygon['id_nearest_line'] = df_polygon.geometry.apply(nearest_linestring, df_lines=df_lines)

pd.merge(df_polygon, df_lines, right_on='id_nearest_line', left_on='id',how='inner')

However, an important remark with this approach: it will only find a single nearest one, so if you had for a certain polygon multiple linestrings that are intersecting with it, it will not give them all. It should be possible to update the function for that though.
Second remark: if you have a lot of data, calculating the distance for all linestrings like the in the function above might not be very efficient. You could use spatial index to improve this, but I would only worry about that if the speed turns out to actually be a problem.

Best Answer

Related Solutions

GeoPandas – Merging Overlapping Features

[GIS] Performing sjoin on polygons and lines without intersection using GeoPandas

Related Question