I have 2 geodataframes; one made from polygons (bldg_res_df
) and one from centroid points (parcel_res_df
). I used .concat
to combine them into a single geodataframe to do some calculations.
df_list = [bldg_res_df, parcel_res_df]
combined_df = gpd.GeoDataFrame(pd.concat(df_list, sort=True))
I summarized certain columns based on a shared column (GEOID
) between both gdf's.
geoid_sum = combined_df[[ 'GEOID', 'bldg_sqft', 'CensusPop']]
geoid_sum = geoid_sum.groupby('GEOID').agg({'GEOID': 'count', 'bldg_sqft': 'sum', 'CensusPop': 'mean'}).reindex(combined_df['GEOID'])
Then I did my calculations and populated previously empty columns (Pop_By_Area
, Tot_Bldg_Sqft
, and Census_Bld_Units
) with the results.
combined_df['Pop_By_Area'] = (geoid_sum['CensusPop'].values *
combined_df['bldg_sqft'])/geoid_sum['bldg_sqft'].values
combined_df['Tot_Bldg_Sqft'] = geoid_sum['bldg_sqft'].values
combined_df['Census_Bld_Units'] = geoid_sum['GEOID'].values
What I want to do now is populate the individual geodataframes
with the newly calculated values for the corresponding row. Or, split the combine_df
into 2 geodataframes
based on geometry type (polygons, points). What is the easiest way to achieve this?
Best Answer
You can split this dataframe using either method you described.
To keep your original dataframes you can copy the calculated values by running an apply row-wise and searching the combined dataframe for the same GEOID.
EDIT: This method slows greatly down as the number of items in the dataframes grows since it has to loop through each and every one and search combined_df. This can be mitigated by setting 'GEOID' as the index, as this will allow for a hash scan (like a dictionary or set)
Though a faster, and simpler way of would to be slicing the calculated columns from your combined dataframe, and filtering the geometry types into new dataframes. Geopandas stores geometry types as Shapely objects, so you can make use of the
.geom_type
attribute of combined_df's geometry column in a.loc
call.