I'm working with Chicago PD crash data that includes Lat/Long. I've read the crashes data into the crashes DF.
It doesn't include the city ward number, so I'd like to add it. I figured out how to do this with a dictionary, where the ward number is the key and the ward boundary (geometry) is the value.
Using a function that takes a DF row as input and returns the ward number, I can add the ward number to each row in the DF.
import geopandas as gpd
import pandas as pd
wards = gpd.read_file('https://data.cityofchicago.org/api/geospatial/sp34-6z76?method=export&format=GeoJSON')
wards['ward'] = pd.to_numeric(wards['ward'])
wards_dd = {}
for index,row in wards.iterrows():
wards_dd[row['ward']] = row['geometry']
def addWard(row):
from shapely.geometry import Point
# iterate through the wards until we find the one continining the point and
# return the ward number.
# Ward data is set up long/lat (x,y) format
p = Point(row['longitude'],row['latitude'])
for i in range(1,51):
if wards_dd[i].contains(p):
return i
return 'NotFound'
crashes['ward'] = crashes.apply(addWard,axis = 1)
So this works, but it seems a bit kludgy. Is there a way to do this with Geopandas .sjoin function that would work better?
The Crashes DF is a standard (not geopandas) DF. The lat/long data is in separate columns. I could combine them into a shapely Point, but can a Point in a normal df be .sjoin-ed with a geopandas df to add the ward number to the Crashes DF?
PS I know there might be problems ignoring the curvature of the earth with this sort of question, but in this case, the areas are quite small (less than 1 mile across) so the curvature issue is negligible.
Best Answer
Spatial join