Identifying n-nearest polygons to point in GeoPandas

geopandasnearest neighborpolygonpython

I have a dataset of fire polygons, and a user inputted lat/lon coordinate. I'm hoping to write a function to subset the 3 nearest polygons to the inputted coordinate, in order to report out their basic characteristics (e.g. name, date, size).

import requests
import geopandas as gpd
from shapely.geometry import Point

def getfires(lat,lon):
    
    # Convert coords to desired format: -122.7140548%2C+38.440429
    if lat > 90 or lat < -90 or lon >180 or lon <-180:
        print("Error: invalid coordinates.")
    else:
        coords = str(lon)+"%2C+"+str(lat)
        
        # Make URL for API request
        urlhead = "https://services1.arcgis.com/jUJYIo9tSA7EHvfZ/arcgis/rest/services/California_Fire_Perimeters/FeatureServer/0/query?where=1%3D1&objectIds=&time=&geometry="
        # Current buffer: 50 miles, change if desired where "&distance="
        urltail = "&geometryType=esriGeometryPoint&inSR=4326&spatialRel=esriSpatialRelIntersects&resultType=standard&distance=50.0&units=esriSRUnit_StatuteMile&returnGeodetic=false&outFields=*&returnGeometry=true&returnCentroid=false&featureEncoding=esriDefault&multipatchOption=none&maxAllowableOffset=&geometryPrecision=&outSR=4326&defaultSR=&datumTransformation=&applyVCSProjection=false&returnIdsOnly=false&returnUniqueIdsOnly=false&returnCountOnly=false&returnExtentOnly=false&returnQueryGeometry=false&returnDistinctValues=false&cacheHint=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&having=&resultOffset=&resultRecordCount=&returnZ=false&returnM=false&returnExceededLimitFeatures=true&quantizationParameters=&sqlFormat=none&f=pgeojson&token="
        url = urlhead+coords+urltail
        print(url)

        # Make API request using URL and make into geodataframe.
        polys = requests.get(url).json()
        polypd = gpd.GeoDataFrame.from_features(polys["features"])
        polypd.crs = 4326 # Set CRS to match that of input dataset.
        print(polypd)

        # Turn original lat.
        geom = Point(lon, lat)
        point = gpd.GeoDataFrame(crs=4326, geometry=[geom])    
        print(point)   

        # # Reproject all data to same CRS - NAD 83 ACA Albers Equal Area
        polypdproj = polypd.to_crs(3310)
        pointproj = point.to_crs(3310)

But from here, after having loaded the data, reprojected, etc., I'm getting stuck. Here's what I've tried:

polypdproj['min_dist'] = polypdproj.geometry.distance(point)

Returns distance for only the first polygon and NAs for the rest, with an error stating the indices are different. I understand this to mean it expects a dataframe with the same number of points and polygons.

min_polys = sorted(polypdproj, key=pointproj.distance)[0:2]

Returns TypeError: (<class 'geopandas.geoseries.GeoSeries'>, <class 'str'>). Specifying polypdproj.geometry returns an error "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

I will always only have one point, so I'd like to avoid setting up Rtree or something more complicated. It seems like there should be a simple solution, but alas!

Best Answer

I'm going to assume that the pointproj GeoDataFrame only has one observation.

If that's the case, you can do this:

# Extracting the actual shapely geometry of the Point
pointproj_geom = pointproj.iloc[0]['geometry']

# Calculating distance from the Point to ALL Polygons
polypdproj['distances'] = polypdproj.distance(pointproj_geom)

# Subsetting to keep only the 3 nearest cases
polypdproj_subset = (polypdproj.loc[polypdproj['distances']
                                   .rank(method='first', ascending=True) <= 3]
                     .sort_values(by='distances', ascending=True)

Now the polypdproj_subset contains a subsetted GeoDataFrame that only has the 3 smallest values in the newly-calculated distances column.

Note: you might want to fiddle with the method parameter of the rank function to better deal with locations where distances are tied.

Related Solutions

[GIS] Finding nearest point in other GeoDataFrame using GeoPandas

If you have large dataframes, I've found that scipy's cKDTree spatial index .query method returns very fast results for nearest neighbor searches. As it uses a spatial index it's orders of magnitude faster than looping though the dataframe and then finding the minimum of all distances. It is also faster than using shapely's nearest_points with RTree (the spatial index method available via geopandas) because cKDTree allows you to vectorize your search whereas the other method does not.

Here is a helper function that will return the distance and 'Name' of the nearest neighbor in gpd2 from each point in gpd1. It assumes both gdfs have a geometry column (of points).


import geopandas as gpd
import numpy as np
import pandas as pd

from scipy.spatial import cKDTree
from shapely.geometry import Point

gpd1 = gpd.GeoDataFrame([['John', 1, Point(1, 1)], ['Smith', 1, Point(2, 2)],
                         ['Soap', 1, Point(0, 2)]],
                        columns=['Name', 'ID', 'geometry'])
gpd2 = gpd.GeoDataFrame([['Work', Point(0, 1.1)], ['Shops', Point(2.5, 2)],
                         ['Home', Point(1, 1.1)]],
                        columns=['Place', 'geometry'])

def ckdnearest(gdA, gdB):

    nA = np.array(list(gdA.geometry.apply(lambda x: (x.x, x.y))))
    nB = np.array(list(gdB.geometry.apply(lambda x: (x.x, x.y))))
    btree = cKDTree(nB)
    dist, idx = btree.query(nA, k=1)
    gdB_nearest = gdB.iloc[idx].drop(columns="geometry").reset_index(drop=True)
    gdf = pd.concat(
        [
            gdA.reset_index(drop=True),
            gdB_nearest,
            pd.Series(dist, name='dist')
        ], 
        axis=1)

    return gdf

ckdnearest(gpd1, gpd2)

And if you want to find the closest point to a LineString, here is a full working example:

import itertools
from operator import itemgetter

import geopandas as gpd
import numpy as np
import pandas as pd

from scipy.spatial import cKDTree
from shapely.geometry import Point, LineString

gpd1 = gpd.GeoDataFrame([['John', 1, Point(1, 1)],
                         ['Smith', 1, Point(2, 2)],
                         ['Soap', 1, Point(0, 2)]],
                        columns=['Name', 'ID', 'geometry'])
gpd2 = gpd.GeoDataFrame([['Work', LineString([Point(100, 0), Point(100, 1)])],
                         ['Shops', LineString([Point(101, 0), Point(101, 1), Point(102, 3)])],
                         ['Home',  LineString([Point(101, 0), Point(102, 1)])]],
                        columns=['Place', 'geometry'])


def ckdnearest(gdfA, gdfB, gdfB_cols=['Place']):
    A = np.concatenate(
        [np.array(geom.coords) for geom in gdfA.geometry.to_list()])
    B = [np.array(geom.coords) for geom in gdfB.geometry.to_list()]
    B_ix = tuple(itertools.chain.from_iterable(
        [itertools.repeat(i, x) for i, x in enumerate(list(map(len, B)))]))
    B = np.concatenate(B)
    ckd_tree = cKDTree(B)
    dist, idx = ckd_tree.query(A, k=1)
    idx = itemgetter(*idx)(B_ix)
    gdf = pd.concat(
        [gdfA, gdfB.loc[idx, gdfB_cols].reset_index(drop=True),
         pd.Series(dist, name='dist')], axis=1)
    return gdf

c = ckdnearest(gpd1, gpd2)

[GIS] Python geopandas dataframe of polygons — determine nearest neighbor polygon

You can use the Python rtree library to build up a spatial index, which then has a nearest method you can use to get the nearest geometry in the index to any given query. I think Shapely also comes with an rtree implementation which behaves similarly, but I could be wrong - I always use rtree. This is probably the fastest way, as it only requires one calculation for each record.

Otherwise, you'll need to compare every geometry to every other one using the shapely distance method, and choose the smallest one. I guess in Pandas that would be a full outer join of your dataset to itself, and add a column of distance, and then query out the records where ID's are not equal (implying you calculated the distance to the same polygon) and distance is the shortest.

Best Answer

Related Solutions

[GIS] Finding nearest point in other GeoDataFrame using GeoPandas

[GIS] Python geopandas dataframe of polygons — determine nearest neighbor polygon

Related Question