Python GeoPandas – Getting Median Attribute Value for Duplicate Polygons and Dropping Duplicates

aggregationgeodataframegeopandasmedianpython

I have a DataFrame that resulted from spatially joining a digital elevation map with a square grid map.

enter image description here

This unexpectedly resulted in duplicate rows where two rows will have the same "geometry" but a different "Elevation" value.

How do I get the median of "Elevation" for each unique "geometry"? I'm new to GeoPandas, so I tried the traditional methods of aggregating a DataFrame, but found that "geometry" cannot be operated with the groupby() function.

mrkna_grid.groupby("geometry")['Elevation'].median()

I have also tried using the dissolve() function, but I don't think I'm doing it correctly because the number of rows were reduced to just seventy (70) as opposed to the original two thousand (2000) before the spatial join.

mrkna_grid.dissolve(by="Elevation", aggfunc="median")

Best Answer

Let's assume there is a polygon layer 'layer' with its attribute table, see the image below.

input

The expected result of median is:

Polygon 1 | 4
Polygon 2 | 7
Polygon 3 | 15

Using one of the following code:

Solution #1 includes Pandas/GeoPandas native DataFrame.median()

import geopandas as gpd

path_to_layer = "C:/Documents/Python Scripts/median/layer.shp"
layer = gpd.read_file(path_to_layer)

layer_ = layer.groupby(layer["geometry"].to_wkt())['Elevation'].median().reset_index(name='MedianElev')
layer_['geometry'] = gpd.GeoSeries.from_wkt(layer_['index'])
layer_.drop(['index'], inplace=True, axis=1)

output = gpd.GeoDataFrame(layer_, geometry='geometry')
output = output.set_crs(layer.crs)
output.to_file("output.shp")

Solution #2 includes Python's statistics.median()

import geopandas as gpd
from statistics import median

path_to_layer = "C:/Documents/Python Scripts/median/layer.shp"
layer = gpd.read_file(path_to_layer)

layer['geom_wkt'] = layer['geometry'].to_wkt()

layer_ = layer.groupby('geom_wkt')['Elevation'].apply(list).reset_index(name='ElevationList')
layer_['median'] = layer_['ElevationList'].apply(lambda x: median(x))
layer_['geometry'] = gpd.GeoSeries.from_wkt(layer_['geom_wkt'])
layer_.drop(['geom_wkt', 'ElevationList'], inplace=True, axis=1)

output = gpd.GeoDataFrame(layer_, geometry='geometry')
output = output.set_crs(layer.crs)
output.to_file("output.shp")

it is possible to achieve the following output:

result


References:

Related Question