Combining st_join and st_nn for Points Within Polygon – R Guide

nearest neighborpoint-in-polygonrsfspatial-join

This is a follow up to my earlier question (Spatial join in R – Adding points to polygons with multiple corresponding points).

I have successfully joined a spatial points file to a polygon file in R using the st_join function within the sf package with more than one point being assigned to a polygon if necessary, duplicating rows but keeping all points which fall within a polygon.

st_join(polygons, points)

However I also need to join points which fall outside of the polygons but within 500m of a polygon to their nearest polygon. Points which are >500m away from a polygon can be discarded.

I thought that combining the above with st_nn from the nngeo package should work using the following:

st_join(polygons, points, join = st_nn, maxdist = 500)

However in this case only 1 point is assigned to a polygon, even if more than one point falls within a polygon or within 500m of a polygon. i.e. the rows are not duplicated.

Here is a screenshot of a sample of points and polygons:

enter image description here

And here is table showing how the points should be assigned to the polygons and how they have been assigned in the respective methods:

enter image description here

I find it a little strange that the second method does not keep the duplicates, even though it is based on the same function. Can anyone tell me what I'm doing wrong here?

Edit: I tried adjusting the k parameter but this simply joins the first points within the given distance up to the max number given and therefore can assign 1 point to 2 polygons. e.g.

st_join(polygons, points, join = st_nn, k = 10, maxdist = 500)

returns 5 points for polygon 89028 as there are 5 points within 500m, when in fact only 1 point should be returned (011-05-0529) as the other 4 points are already assigned to the polygon in which they fall. A point should only be assigned to one polygon.

Best Answer

If I understood correctly, you find the containing polygon of each point, or else the nearest polygon (up to 500m) if the point is not contained inside any polygon.

If so, the following expression, where the order of x and y is reversed, should work -

st_join(points, polygons, join = st_nn, k = 1, maxdist = 500)

The function will look for the nearest polygon from each point. The containing polygon, if any, is always considered to be nearest since its distance from the point is zero. If no containing polygon is found, the function will look for the nearest polygon, up to a maximal distance of 500m.

Related Question