Identifying n-nearest polygons to point in GeoPandas

geopandasnearest neighborpolygonpython

I have a dataset of fire polygons, and a user inputted lat/lon coordinate. I'm hoping to write a function to subset the 3 nearest polygons to the inputted coordinate, in order to report out their basic characteristics (e.g. name, date, size).

import requests
import geopandas as gpd
from shapely.geometry import Point

def getfires(lat,lon):
    
    # Convert coords to desired format: -122.7140548%2C+38.440429
    if lat > 90 or lat < -90 or lon >180 or lon <-180:
        print("Error: invalid coordinates.")
    else:
        coords = str(lon)+"%2C+"+str(lat)
        
        # Make URL for API request
        urlhead = "https://services1.arcgis.com/jUJYIo9tSA7EHvfZ/arcgis/rest/services/California_Fire_Perimeters/FeatureServer/0/query?where=1%3D1&objectIds=&time=&geometry="
        # Current buffer: 50 miles, change if desired where "&distance="
        urltail = "&geometryType=esriGeometryPoint&inSR=4326&spatialRel=esriSpatialRelIntersects&resultType=standard&distance=50.0&units=esriSRUnit_StatuteMile&returnGeodetic=false&outFields=*&returnGeometry=true&returnCentroid=false&featureEncoding=esriDefault&multipatchOption=none&maxAllowableOffset=&geometryPrecision=&outSR=4326&defaultSR=&datumTransformation=&applyVCSProjection=false&returnIdsOnly=false&returnUniqueIdsOnly=false&returnCountOnly=false&returnExtentOnly=false&returnQueryGeometry=false&returnDistinctValues=false&cacheHint=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&having=&resultOffset=&resultRecordCount=&returnZ=false&returnM=false&returnExceededLimitFeatures=true&quantizationParameters=&sqlFormat=none&f=pgeojson&token="
        url = urlhead+coords+urltail
        print(url)

        # Make API request using URL and make into geodataframe.
        polys = requests.get(url).json()
        polypd = gpd.GeoDataFrame.from_features(polys["features"])
        polypd.crs = 4326 # Set CRS to match that of input dataset.
        print(polypd)

        # Turn original lat.
        geom = Point(lon, lat)
        point = gpd.GeoDataFrame(crs=4326, geometry=[geom])    
        print(point)   

        # # Reproject all data to same CRS - NAD 83 ACA Albers Equal Area
        polypdproj = polypd.to_crs(3310)
        pointproj = point.to_crs(3310)

But from here, after having loaded the data, reprojected, etc., I'm getting stuck. Here's what I've tried:

polypdproj['min_dist'] = polypdproj.geometry.distance(point)

Returns distance for only the first polygon and NAs for the rest, with an error stating the indices are different. I understand this to mean it expects a dataframe with the same number of points and polygons.

min_polys = sorted(polypdproj, key=pointproj.distance)[0:2]

Returns TypeError: (<class 'geopandas.geoseries.GeoSeries'>, <class 'str'>). Specifying polypdproj.geometry returns an error "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

I will always only have one point, so I'd like to avoid setting up Rtree or something more complicated. It seems like there should be a simple solution, but alas!

Best Answer

I'm going to assume that the pointproj GeoDataFrame only has one observation.

If that's the case, you can do this:

# Extracting the actual shapely geometry of the Point
pointproj_geom = pointproj.iloc[0]['geometry']

# Calculating distance from the Point to ALL Polygons
polypdproj['distances'] = polypdproj.distance(pointproj_geom)

# Subsetting to keep only the 3 nearest cases
polypdproj_subset = (polypdproj.loc[polypdproj['distances']
                                   .rank(method='first', ascending=True) <= 3]
                     .sort_values(by='distances', ascending=True)

Now the polypdproj_subset contains a subsetted GeoDataFrame that only has the 3 smallest values in the newly-calculated distances column.

Note: you might want to fiddle with the method parameter of the rank function to better deal with locations where distances are tied.