GeoPandas – How to Transfer Values Between GeoDataFrames

geodataframegeopandaspython 3splitting

I have 2 geodataframes; one made from polygons (bldg_res_df) and one from centroid points (parcel_res_df). I used .concat to combine them into a single geodataframe to do some calculations.

df_list = [bldg_res_df, parcel_res_df]
combined_df = gpd.GeoDataFrame(pd.concat(df_list, sort=True))

I summarized certain columns based on a shared column (GEOID) between both gdf's.

geoid_sum = combined_df[[ 'GEOID', 'bldg_sqft', 'CensusPop']]
geoid_sum = geoid_sum.groupby('GEOID').agg({'GEOID': 'count', 'bldg_sqft': 'sum', 'CensusPop': 'mean'}).reindex(combined_df['GEOID'])

Then I did my calculations and populated previously empty columns (Pop_By_Area, Tot_Bldg_Sqft, and Census_Bld_Units) with the results.

combined_df['Pop_By_Area'] = (geoid_sum['CensusPop'].values * 
combined_df['bldg_sqft'])/geoid_sum['bldg_sqft'].values
combined_df['Tot_Bldg_Sqft'] = geoid_sum['bldg_sqft'].values
combined_df['Census_Bld_Units'] = geoid_sum['GEOID'].values

What I want to do now is populate the individual geodataframes with the newly calculated values for the corresponding row. Or, split the combine_df into 2 geodataframes based on geometry type (polygons, points). What is the easiest way to achieve this?

Best Answer

You can split this dataframe using either method you described.

To keep your original dataframes you can copy the calculated values by running an apply row-wise and searching the combined dataframe for the same GEOID.

EDIT: This method slows greatly down as the number of items in the dataframes grows since it has to loop through each and every one and search combined_df. This can be mitigated by setting 'GEOID' as the index, as this will allow for a hash scan (like a dictionary or set)

# Set GEOID as the index of combined_df. drop=False, tells the function to keep GEOID in the columns of the dataframe.
combined_df.set_index('GEOID', drop=False, inplace=True)

bldg_res_df['Pop_By_Area'] = bldg_res_df['GEOID'].apply(lambda bldg_geoid: combined_df.loc[bldg_geoid, 'Pop_By_Area'])
parcel_res_df['Pop_By_Area'] = parcel_res_df['GEOID'].apply(lambda parcel_geoid: combined_df.loc[parcel_geoid, 'Pop_By_Area'])

Though a faster, and simpler way of would to be slicing the calculated columns from your combined dataframe, and filtering the geometry types into new dataframes. Geopandas stores geometry types as Shapely objects, so you can make use of the .geom_type attribute of combined_df's geometry column in a .loc call.

points_df = combined_df.loc[combined_df['geometry'].geom_type == 'Polygon', ['GEOID', 'HU_Pop', 'PARCEL_ID', 'Pop_By_Area', 'STORY_NBR', 'Tot_Bldg_Sqft', 'bldg_sqft', 'geometry']]]
polygon_df = combined_df.loc[combined_df['geometry'].geom_type == 'Point', ['GEOID', 'HU_Pop', 'PARCEL_ID', 'Pop_By_Area', 'STORY_NBR', 'Tot_Bldg_Sqft', 'bldg_sqft', 'geometry']]]

Related Solutions

[GIS] Speed up row-wise point in polygon with Geopandas

I would recommend either using the geopandas-cython branch here or pygeos.

If you use pygeos, I would recommend converting the geometries from shapely to the pygeos version first for the best speedups.

Python – Why Spatial Join on Geodataframes Returns Empty Result

This problem is a coordinate reference system issue. It is most likely caused by my census block feature class(aka polys) being in ESRI:102008. When I converted my polys feature class to a geodataframe, it showed {} when I ran polys.crs. Further investigation showed that the geometry looked like : (POLYGON ((1385968.793600001 83806.20800000057..., which was also incorrect. The solution required first finding the original crs, assigning the 'polys.crs' to the correct crs and then reprojecting to the desired crs. To do this, I used ogr/gdal:

import ogr
def get_proj(gdb_input, fc ):
    driver = ogr.GetDriverByName("OpenFileGDB")
    gdb = driver.Open(gdb_input, 0)
    print('There are {} total layers in {}'.format(gdb.GetLayerCount(), gdb_input))

    for i in range(gdb.GetLayerCount()):
        layer = gdb.GetLayerByIndex(i)
        if layer.GetDescription() == fc:
            break
    in_crs = layer.GetSpatialRef().ExportToProj4()
    return in_crs
    print('CRS = ', in_crs)

polys.crs = get_proj(poly_gdb, poly_name)
polys = cb_df.to_crs(cents.crs)
intersect = gpd.sjoin(ccents, polys, how='inner', op='intersects')

This returned the correctly intersected geodataframe. I have to say that I am surprised there is not documentation for what seems like a very common problem in using Geopandas with data from an ESRI context. Hope this helps.

Best Answer

Related Solutions

[GIS] Speed up row-wise point in polygon with Geopandas

Python – Why Spatial Join on Geodataframes Returns Empty Result

Related Question