GeoPandas Max Attribute – How to Get Polygon with Maximum Attribute Within Same Geometry Using GeoPandas

geodataframegeometrygeopandasmaximumpython

I have a GeoDataframe containing the spatially joined result of a square grid map and flood hazard data. However, there are instances of rows with the same "geometry" but differing "flood_score" data (because of the spatial join intersection). How do I keep only the max "flood_score" data for each unique "geometry"?

I've tried the code below:

test = mrkna_grid.dissolve(by='flood_score', aggfunc='max')

However, it only returns 4 rows (as opposed to thousands) and grouped it by the "flood_score".

Essentially, I want to do this, but it doesn't work with "geometry":

df.loc[df.reset_index().groupby(['geometry'])['flood_score'].idxmax()]

Best Answer

Let's assume there is polygon layer (a shapefile) with the following attribute table, see image below

With this code I am loading this shapefile into a GeoDataFrame

import geopandas as gpd

file = "P:/Test/qgis_test/test_for_geopandas.shp"

gdf = gpd.read_file(file)
print(gdf)

The GeoDataFrame itself

    fid  ...                                           geometry
0   6.0  ...  POLYGON ((233499.352 5752838.208, 559980.331 5...
1   7.0  ...  POLYGON ((233499.352 5752838.208, 559980.331 5...
2   8.0  ...  POLYGON ((233499.352 5752838.208, 559980.331 5...
3   9.0  ...  POLYGON ((978501.160 5695530.377, 1164317.462 ...
4  10.0  ...  POLYGON ((978501.160 5695530.377, 1164317.462 ...
5  11.0  ...  POLYGON ((978501.160 5695530.377, 1164317.462 ...
6  12.0  ...  POLYGON ((978501.160 5695530.377, 1164317.462 ...
7  13.0  ...  POLYGON ((978501.160 5695530.377, 1164317.462 ...
8  14.0  ...  POLYGON ((978501.160 5695530.377, 1164317.462 ...
9  15.0  ...  POLYGON ((485306.490 4940108.963, 681542.397 5...

[10 rows x 4 columns]

With the following code, it is possible to keep only the max "flood_score" data for each unique geometry.

gdf_ = gdf.sort_values('flood_scor', ascending=False).drop_duplicates(['geometry'])
print(gdf_)

The output GeoDataFrame will look like:

    fid  ...                                           geometry
8  14.0  ...  POLYGON ((978501.160 5695530.377, 1164317.462 ...
1   7.0  ...  POLYGON ((233499.352 5752838.208, 559980.331 5...
9  15.0  ...  POLYGON ((485306.490 4940108.963, 681542.397 5...

[3 rows x 4 columns]

Solution #1 includes Pandas/GeoPandas native `DataFrame.median()`

import geopandas as gpd

path_to_layer = "C:/Documents/Python Scripts/median/layer.shp"
layer = gpd.read_file(path_to_layer)

layer_ = layer.groupby(layer["geometry"].to_wkt())['Elevation'].median().reset_index(name='MedianElev')
layer_['geometry'] = gpd.GeoSeries.from_wkt(layer_['index'])
layer_.drop(['index'], inplace=True, axis=1)

output = gpd.GeoDataFrame(layer_, geometry='geometry')
output = output.set_crs(layer.crs)
output.to_file("output.shp")

Solution #2 includes Python's `statistics.median()`

import geopandas as gpd
from statistics import median

path_to_layer = "C:/Documents/Python Scripts/median/layer.shp"
layer = gpd.read_file(path_to_layer)

layer['geom_wkt'] = layer['geometry'].to_wkt()

layer_ = layer.groupby('geom_wkt')['Elevation'].apply(list).reset_index(name='ElevationList')
layer_['median'] = layer_['ElevationList'].apply(lambda x: median(x))
layer_['geometry'] = gpd.GeoSeries.from_wkt(layer_['geom_wkt'])
layer_.drop(['geom_wkt', 'ElevationList'], inplace=True, axis=1)

output = gpd.GeoDataFrame(layer_, geometry='geometry')
output = output.set_crs(layer.crs)
output.to_file("output.shp")

it is possible to achieve the following output:

References:

GeoPandas – Calculate Percentage of Area Intersect in Python

You can achieve this using overlay operations. Here's a quick example using some fake data.

import geopandas as gpd
from shapely.geometry import Polygon

# Creating the GeoDataFrame with the grid geometries
grid_gdf = gpd.GeoDataFrame(data={'grid_id':[101,102,103,104],
                                  'grid_cat':['W','X','Y','Z'],
                                  'geometry':[Polygon([(1,5),(3,5),(3,3),(1,3)]),
                                              Polygon([(3,5),(5,5),(5,3),(3,3)]),
                                              Polygon([(1,3),(3,3),(3,1),(1,1)]),
                                              Polygon([(3,3),(5,3),(5,1),(3,1)])]},
                            geometry='geometry')
grid_gdf['area_grid'] = grid_gdf.area
grid_gdf.plot(column='grid_id')

# Creating the GeoDataFrame with the land geometries
land_gdf = gpd.GeoDataFrame(data={'land_id':[1,2,3,4,5,6,7,8,9],
                                  'land_cat':['A','B','C','B','C','A','C','A','B'],
                                  'geometry':[Polygon([(0,6),(2,6),(2,4),(0,4)]),
                                              Polygon([(2,6),(4,6),(4,4),(2,4)]),
                                              Polygon([(4,6),(6,6),(6,4),(4,4)]),
                                              Polygon([(0,4),(2,4),(2,2),(0,2)]),
                                              Polygon([(2,4),(4,4),(4,2),(2,2)]),
                                              Polygon([(4,4),(6,4),(6,2),(4,2)]),
                                              Polygon([(0,2),(2,2),(2,0),(0,0)]),
                                              Polygon([(2,2),(4,2),(4,0),(2,0)]),
                                              Polygon([(4,2),(6,2),(6,0),(4,0)])]},
                            geometry='geometry')
land_gdf['area_land'] = land_gdf.area
land_gdf.plot(column='land_id')

# Performing overlay funcion
gdf_joined = gpd.overlay(grid_gdf,land_gdf, how='union')

# Calculating the areas of the newly-created geometries
gdf_joined['area_joined'] = gdf_joined.area

# Calculating the areas of the newly-created geometries in relation 
# to the original grid cells
gdf_joined['land_area_as_a_share_of_grid_area'] = (gdf_joined['area_joined'] / 
                                                   gdf_joined['area_grid'])

# Aggregating the results
results = (gdf_joined
           .groupby(['grid_id','land_cat'])
           .agg({'land_area_as_a_share_of_grid_area':'sum'}))

# Printing results
print(results)

#                   land_area_as_a_share_of_grid_area
# grid_id land_cat                                   
# 101.0   A                                      0.25
#         B                                      0.50
#         C                                      0.25
# 102.0   A                                      0.25
#         B                                      0.25
#         C                                      0.50
# 103.0   A                                      0.25
#         B                                      0.25
#         C                                      0.50
# 104.0   A                                      0.50
#         B                                      0.25
#         C                                      0.25

When adapting to your case, you'll likely want to change column names used for each operation, but you can probably understand the gist of what's going on.

Best Answer

Related Solutions

Python GeoPandas – Getting Median Attribute Value for Duplicate Polygons and Dropping Duplicates

Solution #1 includes Pandas/GeoPandas native DataFrame.median()

Solution #2 includes Python's statistics.median()

GeoPandas – Calculate Percentage of Area Intersect in Python

Related Question

Solution #1 includes Pandas/GeoPandas native `DataFrame.median()`

Solution #2 includes Python's `statistics.median()`