GeoPandas – How to Compute Centroids Based on Field While Keeping Aggregated Values

aggregatecentroidsgeopandaspythonshapely

I have a shapefile containing 85.000 points representing houses. I want to create a new shapefile containing the centroid of each subset of houses with the same postal code. This centroids file should also contain the mean values for attributes of the square meters and building year of the houses in that postal code. This last part is not working.

My workflow is as follows:

  1. Dissolve the points shapefile into a multipoints file, where each multipoint is a group of points with the same postal code. Aggregate mean values for other attributes in the multipoints.
  2. Compute the centroid of each multipoint
  3. Write to new shapefile

This is my code for dissolving and computing centroids:

gpd = geopandas.read_file(my_file)
dissolved_gpd = gpd.dissolve(by='postcode',aggfunc='mean')
centroids = dissolved_gpd.centroid

When I print dissolved_gpd and centroids they look like this (just 1 line per example).
dissolved_gpd:

                   geometry                                  ...  bouwjaar
postcode                                                     ...             
6511AA    MULTIPOINT (187217.617 428676.815, 187254.576 ...  ...  1983.692308
[5048 rows x 4 columns]

centroids:

postcode
6511AA    POINT (187242.378 428870.156)
Length: 5048, dtype: geometry

As you can see, the 'bouwjaar' aggregated attribute is lost when computing the centroids. Is there any way to pass this on when calculating centroids in GeoPandas? I can't find anything about this in the docs. Or is there another method of computing centroids where it's possible to pass on aggregated values?

Best Answer

The way geopandas works is indeed a little confusing. When you call centroid it is returning a GeoSeries (just the geometries) and not the full GeoDataFrame (geometries + attributes).
Do this to retain the full GeoDataFrame and replace the MultiPoint geometries with their centroid.

gpd = geopandas.read_file(my_file)
dissolved_gpd = gpd.dissolve(by='postcode',aggfunc='mean')
dissolved_gpd['geometry'] = dissolved_gpd.geometry.centroid
Related Question