Python Spatial Join – How to Obtain Mean, Maximum, and Minimum of Points Within Polygons Using Spatial Join in Python

geopandaspythonspatial-join

I am trying to estimate summary statistics of points located inside a polygon. Each points in the point layer is associated with N attributes. My target is to summarize (say mean, min,..) the attributes of points located within a polygon and populate the attribute fields of the corresponding polygon.

I am looking for a solution using GeoPandas or other Python libraries.

import geopandas as gpd  

gdf_points = gpd.read_file('/path_to_points.json')
gdf_polygon = gpd.read_file('/path_to_polygons.json')

dfsjoin = gpd.sjoin(gdf_polygon ,gdf_points)

Now, how can I summarize the stats for each attribute in the point layer and add it to the polygon shapefile? Which function can I use?

I am looking for something that is functionally equivalent to ESRI ArcGIS SpatialJoin_analysis with fieldmappings

Best Answer

First of all it is necessary to determine the points that are contained in the polygons and which points in which polygons

points = gpd.read_file("points.shp")
points.head()
   id  value1 value2    geometry
0   1   300   300003    POINT (19.579 -18.625)
1   2   400   400003    POINT (80.639 -114.895)
2   3   500   500003    POINT (98.021 -70.326)
3   4   100   100003    POINT (118.522 -100.187)
4   5   200   200003    POINT (186.713 -35.562)
polys = gpd.read_file("polys.shp")
polys
   id     geometry
0   1   POLYGON ((51.223 -134.951, 50.777 -74.337, 106...
1   2   POLYGON ((223.706 -134.506, 228.163 -68.543, 3...
2   3   POLYGON ((151.058 -185.315, 167.994 -167.487, ...

Use a spatial join (as in More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc for example)

from geopandas.tools import sjoin
points_polys = gpd.sjoin(points, polys, how="left")
points_polys.head()
 id_left value1 value2      geometry         index_right  id_right
0   1     300   300003  POINT (19.579 -18.625)  NaN        NaN
1   2     400   400003  POINT (80.639 -114.895) 0.0        1.0
2   3     500   500003  POINT (98.021 -70.326)  0.0        1.0
3   4     100   100003  POINT (118.522 -100.187)0.0        1.0
4   5     200   200003  POINT (186.713 -35.562) NaN        NaN

The points id 1,2,3 are contained in the polygon 1 (id_right), etc...
Control of the number of points contained in the polygons

print(points_polys.loc[points_polys.id_right == 1,'value1'].count())
3
print(points_polys.loc[points_polys.id_right == 2,'value1'].count())
2
print(points_polys.loc[points_polys.id_right == 3,'value1'].count())
6

To summarize the stats for each attribute in the point layer and add it to the polygon layer, group the points_polys by the id_right column (= polygons) and compute the mean, standard deviation, max and min of the attributes of each group of points (Naming returned columns in Pandas aggregate function)

stats_pt = points_polys.groupby('id_right')['value1','value2'].agg(['mean','std','max','min'])
stats_pt.columns = ["_".join(x) for x in result.columns.ravel()] # 
stats_pt 

        value1_mean value1_std value1_max value1_min value2_mean    value2_std    value2_max value2_min
id_right                                
1.0     333.333333  208.166600   500       100      333336.333333   208166.599947   500003    100003
2.0     735.000000   91.923882   800       670      735003.000000   91923.881554    800003    670003
3.0     36.333333    19.459359   60          7      36336.333333    19459.359359    60003       7003

It is also possible to use Named aggregations (Pandas in 2019 - let's see what's new!)

stats_pt  = points_polys.groupby('id_right').agg( 
       value1_mean = ('value1','mean'),
       value1_std  = ('value1','std'),
       value1_max  = ('value1','max'),
       value1_min  = ('value1','min'),
       value2_mean = ('value2','mean'),
       value2_std  = ('value2','std'),
       value2_max  = ('value2','max'),
       value1_min  = ('value2','min'))

Finally join this DataFrame to the polygon GeoDataFrame and save the resulting layer

import pandas as pd
result = pd.merge(polys, stats_pt , left_on='id',right_index=True,how='outer')
result
   id                   geometry               value1_mean  value1_std  value1_max  value1_min  value2_mean   value2_std      value2_max value2_min
 0  1   POLYGON ((51.223 -134.951, 50.77...     333.333333  208.166600    500         100       333336.333333   208166.599947 500003    100003
 1  2   POLYGON ((223.706 -134.506, 228.16...   735.000000  91.923882     800         670       735003.000000   91923.881554  800003    670003
 2  3   POLYGON ((151.058 -185.315, 167.99...   36.333333   19.459359      60           7       36336.333333    19459.359359  60003       7003


 result.to_file("stat_point_poly.shp")

With value1_std as label:

Related Solutions

[GIS] Python: Break linestring based on condition

I haven't used shapely/geopandas yet, so I can only provide pseudocode:

distance_threshold = 50 # Value at which distance to cut off
new_lines = [] # Array to hold the newly created, split lines
new_line_marker = 0 # Let's remember where our new line starts
for linestring in linestrings: # Iterate over all linestrings
  for i, coord in enumerate(linestring.coords[:-1]): # Iterate over all coords of the linestring
    if distance(coord, coords[i+1]) >= distance_threshold: # Check if threshold is met
      # If condition is met, we generate a new linestring,
      # starting from the last split to the current one
      new_lines[] = new LineString(coords[new_line_marker:i])
      new_line_marker = i+1 # remember to reset the marker

The distance function should be something that your libs already offer, or you'll have to implement it yourself (ol' buddy Pythagoras will help you out).

Efficiency can be improved as needed from there, but it should be a good starting point.

[GIS] Joining points with nearest polygon using QGIS

Try splitting your point data in two. First part would be the points that fall within a polygon and the second part would be points falling outside polygons. For the first part, perform a regular spatial join, and for the second part, use the NNJoin plugin. You can later merge the two resulting point layers into one. This way you can avoid the plugin performing an incorrect join on some of the points that fall within a polygon.

In order to select points that fall within the polygon layer, use the Spatial Query plugin (now a core QGIS plugin), then save the selection as a new layer. Invert the selection and save it as the second layer and perform the join using the NNJoin plugin.

Note that I have never used the NNJoin plugin.

Best Answer

Related Solutions

[GIS] Python: Break linestring based on condition

[GIS] Joining points with nearest polygon using QGIS

Related Question