You can split this dataframe using either method you described.
To keep your original dataframes you can copy the calculated values by running an apply row-wise and searching the combined dataframe for the same GEOID.
EDIT: This method slows greatly down as the number of items in the dataframes grows since it has to loop through each and every one and search combined_df. This can be mitigated by setting 'GEOID' as the index, as this will allow for a hash scan (like a dictionary or set)
# Set GEOID as the index of combined_df. drop=False, tells the function to keep GEOID in the columns of the dataframe.
combined_df.set_index('GEOID', drop=False, inplace=True)
bldg_res_df['Pop_By_Area'] = bldg_res_df['GEOID'].apply(lambda bldg_geoid: combined_df.loc[bldg_geoid, 'Pop_By_Area'])
parcel_res_df['Pop_By_Area'] = parcel_res_df['GEOID'].apply(lambda parcel_geoid: combined_df.loc[parcel_geoid, 'Pop_By_Area'])
Though a faster, and simpler way of would to be slicing the calculated columns from your combined dataframe, and filtering the geometry types into new dataframes. Geopandas stores geometry types as Shapely objects, so you can make use of the .geom_type
attribute of combined_df's geometry column in a .loc
call.
points_df = combined_df.loc[combined_df['geometry'].geom_type == 'Polygon', ['GEOID', 'HU_Pop', 'PARCEL_ID', 'Pop_By_Area', 'STORY_NBR', 'Tot_Bldg_Sqft', 'bldg_sqft', 'geometry']]]
polygon_df = combined_df.loc[combined_df['geometry'].geom_type == 'Point', ['GEOID', 'HU_Pop', 'PARCEL_ID', 'Pop_By_Area', 'STORY_NBR', 'Tot_Bldg_Sqft', 'bldg_sqft', 'geometry']]]
Per the doc:
Source: https://geopandas.org/reference.html
It's a method which applies directly on your GeoDataFrame
object, therefore any extra argument you pass will be counted as a 2nd argument, hence the error you face. Please notice that GeoDataFrame.explode()
is intended to:
Explode muti-part geometries into multiple single geometries.
Therefore, the following is working because all geometries are multilines, so it splits them into new rows, each hosting one of the part from the original geometry:
import geopandas as gpd
from shapely import wkt
gdf = gpd.GeoDataFrame({
'ID': [1,2,3,4,5],
'identifiant': [11,12,13,14,15],
'nom_concat': ['123abc',['123def','123ghj'],['123klm','123nop'],'123qrs','123tuv'],
'geometry': [wkt.loads(mlt) for mlt in 5*'MULTILINESTRING((3 4,10 50,20 25),(-5 -8,-10 -8,-15 -4))+'[:-1].split('+')]
})
which results in:
And calling .explode()
will actually explode your geometries:
But if you do not have multiple geometries:
import geopandas as gpd
from shapely import wkt
gdf2 = gpd.GeoDataFrame({
'ID': [1,2,3,4,5],
'identifiant': [11,12,13,14,15],
'nom_concat': ['123abc',['123def','123ghj'],['123klm','123nop'],'123qrs','123tuv'],
'geometry': [wkt.loads(mlt) for pkt in 5*'MULTILINESTRING((3 4,10 50,20 25),(-5 -8,-10 -8,-15 -4))+'[:-1].split('+')]
})
calling .explode()
will return your original dataframe:
What you want is probably to use pandas explode method instead, which is waiting for a column parameter:
import geopandas as gpd
from shapely import wkt
import pandas as pd
gdf2 = gpd.GeoDataFrame({
'ID': [1,2,3,4,5],
'identifiant': [11,12,13,14,15],
'nom_concat': ['123abc',['123def','123ghj'],['123klm','123nop'],'123qrs','123tuv'],
'geometry': [wkt.loads(mlt) for pkt in 5*'MULTILINESTRING((3 4,10 50,20 25),(-5 -8,-10 -8,-15 -4))+'[:-1].split('+')]
})
df = pd.DataFrame(gdf2) # convert to a panda DataFrame instance
type(df)
df.explode('nom_concat') # call pandas explode method on a column
You can finally convert it back to a GeoDataFrame:
exploded_gdf = gpd.GeoDataFrame(df.explode('nom_concat'))
Beware of the index changes after the explosion. You can obviously change it depending on your needs, e.g. with .reset_index()
.
Best Answer
You can find here a nice explanation of what that error means.
In your case, use like this: