GeoPandas – How to Transfer Values Between GeoDataFrames

geodataframegeopandaspython 3splitting

I have 2 geodataframes; one made from polygons (bldg_res_df) and one from centroid points (parcel_res_df). I used .concat to combine them into a single geodataframe to do some calculations.

df_list = [bldg_res_df, parcel_res_df]
combined_df = gpd.GeoDataFrame(pd.concat(df_list, sort=True))

I summarized certain columns based on a shared column (GEOID) between both gdf's.

geoid_sum = combined_df[[ 'GEOID', 'bldg_sqft', 'CensusPop']]
geoid_sum = geoid_sum.groupby('GEOID').agg({'GEOID': 'count', 'bldg_sqft': 'sum', 'CensusPop': 'mean'}).reindex(combined_df['GEOID'])

Then I did my calculations and populated previously empty columns (Pop_By_Area, Tot_Bldg_Sqft, and Census_Bld_Units) with the results.

combined_df['Pop_By_Area'] = (geoid_sum['CensusPop'].values * 
combined_df['bldg_sqft'])/geoid_sum['bldg_sqft'].values
combined_df['Tot_Bldg_Sqft'] = geoid_sum['bldg_sqft'].values
combined_df['Census_Bld_Units'] = geoid_sum['GEOID'].values

What I want to do now is populate the individual geodataframes with the newly calculated values for the corresponding row. Or, split the combine_df into 2 geodataframes based on geometry type (polygons, points). What is the easiest way to achieve this?

Best Answer

You can split this dataframe using either method you described.

To keep your original dataframes you can copy the calculated values by running an apply row-wise and searching the combined dataframe for the same GEOID.

EDIT: This method slows greatly down as the number of items in the dataframes grows since it has to loop through each and every one and search combined_df. This can be mitigated by setting 'GEOID' as the index, as this will allow for a hash scan (like a dictionary or set)

# Set GEOID as the index of combined_df. drop=False, tells the function to keep GEOID in the columns of the dataframe.
combined_df.set_index('GEOID', drop=False, inplace=True)

bldg_res_df['Pop_By_Area'] = bldg_res_df['GEOID'].apply(lambda bldg_geoid: combined_df.loc[bldg_geoid, 'Pop_By_Area'])
parcel_res_df['Pop_By_Area'] = parcel_res_df['GEOID'].apply(lambda parcel_geoid: combined_df.loc[parcel_geoid, 'Pop_By_Area'])

Though a faster, and simpler way of would to be slicing the calculated columns from your combined dataframe, and filtering the geometry types into new dataframes. Geopandas stores geometry types as Shapely objects, so you can make use of the .geom_type attribute of combined_df's geometry column in a .loc call.

points_df = combined_df.loc[combined_df['geometry'].geom_type == 'Polygon', ['GEOID', 'HU_Pop', 'PARCEL_ID', 'Pop_By_Area', 'STORY_NBR', 'Tot_Bldg_Sqft', 'bldg_sqft', 'geometry']]]
polygon_df = combined_df.loc[combined_df['geometry'].geom_type == 'Point', ['GEOID', 'HU_Pop', 'PARCEL_ID', 'Pop_By_Area', 'STORY_NBR', 'Tot_Bldg_Sqft', 'bldg_sqft', 'geometry']]]