Assign values from one column to another conditionally using GeoPandas

geopandaspython

I'm having trouble creating an if else loop to update a certain column in my GeoDataFrame. Here I group by and summarize point counts per zone from points feature class to polygon feature class and I also divide the number of points in each zone to the area of the zone in square miles to create incident per area count. Up to this point everything works as expected that gives me number of incidents per area in a pandas series but when I try to assign a string to an empty column on my polygon feature class using if statement I get

ValueError: The truth value of a Series is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().

import geopandas as gpd
import pandas as pd

graf = gpd.read_file(r"E:\PoliceData.gdb", driver='fileGDB', layer="GraffitiIncidents")
pz = gpd.read_file(r"E:\PoliceData.gdb", driver='fileGDB', layer="PatrolZones")

# update the incidents column with the count of incident per zone
pz["INCIDENTS"] = gpd.sjoin(graf, pz, how="inner", predicate="within").groupby(["NAME"]).size().reset_index(name="incidentsPerZone")["incidentsPerZone"].copy()

# declare incident rate as series
incidentRate = pz["INCIDENTS"] / (pz["SHAPE_Area"] / 2589988.11).copy()

This is the if statement I'm trying to use assign a string:

if incidentRate > 15:
    pz["PRIORITY"] = "TOP CONCERN"
elif incidentRate >= 12:
    pz["PRIORITY"] = "HIGH CONCERN"
elif incidentRate >= 6:
    pz["PRIORITY"] = "MEDIUM CONCERN"
else:
    pz["PRIORITY"] = "LOW CONCERN"

Best Answer

You can find here a nice explanation of what that error means.

In your case, use like this:

pz.loc[incidentRate > 15, "PRIORITY"] = "TOP CONCERN"    
pz.loc[(incidentRate < 15) & (incidentRate >= 12), "PRIORITY"] = "HIGH CONCERN"    
pz.loc[(incidentRate < 12) & (incidentRate >= 6), "PRIORITY"] = "MEDIUM CONCERN"    
pz.loc[incidentRate < 6, "PRIORITY"] = "LOW CONCERN"

Related Solutions

[GIS] How to pull values from one geodataframe to populate corresponding column/rows in another geodataframe

You can split this dataframe using either method you described.

To keep your original dataframes you can copy the calculated values by running an apply row-wise and searching the combined dataframe for the same GEOID.

EDIT: This method slows greatly down as the number of items in the dataframes grows since it has to loop through each and every one and search combined_df. This can be mitigated by setting 'GEOID' as the index, as this will allow for a hash scan (like a dictionary or set)

# Set GEOID as the index of combined_df. drop=False, tells the function to keep GEOID in the columns of the dataframe.
combined_df.set_index('GEOID', drop=False, inplace=True)

bldg_res_df['Pop_By_Area'] = bldg_res_df['GEOID'].apply(lambda bldg_geoid: combined_df.loc[bldg_geoid, 'Pop_By_Area'])
parcel_res_df['Pop_By_Area'] = parcel_res_df['GEOID'].apply(lambda parcel_geoid: combined_df.loc[parcel_geoid, 'Pop_By_Area'])

Though a faster, and simpler way of would to be slicing the calculated columns from your combined dataframe, and filtering the geometry types into new dataframes. Geopandas stores geometry types as Shapely objects, so you can make use of the .geom_type attribute of combined_df's geometry column in a .loc call.

points_df = combined_df.loc[combined_df['geometry'].geom_type == 'Polygon', ['GEOID', 'HU_Pop', 'PARCEL_ID', 'Pop_By_Area', 'STORY_NBR', 'Tot_Bldg_Sqft', 'bldg_sqft', 'geometry']]]
polygon_df = combined_df.loc[combined_df['geometry'].geom_type == 'Point', ['GEOID', 'HU_Pop', 'PARCEL_ID', 'Pop_By_Area', 'STORY_NBR', 'Tot_Bldg_Sqft', 'bldg_sqft', 'geometry']]]

[GIS] Exploding column in GeoPandas

Per the doc:

Source: https://geopandas.org/reference.html

It's a method which applies directly on your GeoDataFrame object, therefore any extra argument you pass will be counted as a 2nd argument, hence the error you face. Please notice that GeoDataFrame.explode() is intended to:

Explode muti-part geometries into multiple single geometries.

Therefore, the following is working because all geometries are multilines, so it splits them into new rows, each hosting one of the part from the original geometry:

import geopandas as gpd
from shapely import wkt
gdf = gpd.GeoDataFrame({
    'ID': [1,2,3,4,5],
    'identifiant': [11,12,13,14,15],
    'nom_concat': ['123abc',['123def','123ghj'],['123klm','123nop'],'123qrs','123tuv'],
    'geometry': [wkt.loads(mlt) for mlt in 5*'MULTILINESTRING((3 4,10 50,20 25),(-5 -8,-10 -8,-15 -4))+'[:-1].split('+')]
})

which results in:

And calling .explode() will actually explode your geometries:

But if you do not have multiple geometries:

import geopandas as gpd
from shapely import wkt
gdf2 = gpd.GeoDataFrame({
    'ID': [1,2,3,4,5],
    'identifiant': [11,12,13,14,15],
    'nom_concat': ['123abc',['123def','123ghj'],['123klm','123nop'],'123qrs','123tuv'],
    'geometry': [wkt.loads(mlt) for pkt in 5*'MULTILINESTRING((3 4,10 50,20 25),(-5 -8,-10 -8,-15 -4))+'[:-1].split('+')]
})

calling .explode() will return your original dataframe:

What you want is probably to use pandas explode method instead, which is waiting for a column parameter:

import geopandas as gpd
from shapely import wkt
import pandas as pd

gdf2 = gpd.GeoDataFrame({
    'ID': [1,2,3,4,5],
    'identifiant': [11,12,13,14,15],
    'nom_concat': ['123abc',['123def','123ghj'],['123klm','123nop'],'123qrs','123tuv'],
    'geometry': [wkt.loads(mlt) for pkt in 5*'MULTILINESTRING((3 4,10 50,20 25),(-5 -8,-10 -8,-15 -4))+'[:-1].split('+')]
})

df = pd.DataFrame(gdf2) # convert to a panda DataFrame instance
type(df)

df.explode('nom_concat') # call pandas explode method on a column

You can finally convert it back to a GeoDataFrame:

exploded_gdf = gpd.GeoDataFrame(df.explode('nom_concat'))

Beware of the index changes after the explosion. You can obviously change it depending on your needs, e.g. with .reset_index().

Best Answer

Related Solutions

[GIS] How to pull values from one geodataframe to populate corresponding column/rows in another geodataframe

[GIS] Exploding column in GeoPandas

Related Question