[GIS] Find nearest point along polyline using ‘sf’ package in R

linepointproximityrsf

This is a follow-up question related to my earlier post about using the sf package to split polylines using nearby points.

I have 450 points with XY coordinates and a large river network (~214,000 reaches). Many of the points do not intersect the river network, so for each point I would like to find the nearest location and distance along a reach, and to also snap the point to that location.

My current implementation uses a custom wrapper function calling geosphere::dist2line() (largely based on the following questions by Guzman and Ana; see my first question) to do the following (I've include one simple feature POINT (site) and LINESTRING or polyline (reach) to provide a reproducible example):

library(sf)
site <- structure(list(SiteId = "PF_00183", SampId = "PF_00183 _ 2014-10-15", 
    MonitoringProgram = "ProgettoFiumi", X1 = 688524.841170469, 
    Y1 = 172082.839248429, X2 = 688405.761917025, Y2 = 172008.150777719, 
    Width = NA_real_, TransectDistance = 140.563993461242, Area = NA_real_, 
    EZG_NR = 54757, h1 = 163815, h2 = 168422, geometry = structure(list(
        structure(c(688524.8412, 172082.8392), class = c("XY", 
        "POINT", "sfg"))), class = c("sfc_POINT", "sfc"), precision = 0, bbox = structure(c(688524.8412, 
    172082.8392, 688524.8412, 172082.8392), .Names = c("xmin", 
    "ymin", "xmax", "ymax"), class = "bbox"), crs = structure(list(
        epsg = NA_integer_, proj4string = "+proj=somerc +lat_0=46.95240555555556 +lon_0=7.439583333333333 +k_0=1 +x_0=600000 +y_0=200000 +ellps=bessel +units=m +no_defs"), .Names = c("epsg", 
    "proj4string"), class = "crs"), n_empty = 0L)), .Names = c("SiteId", 
"SampId", "MonitoringProgram", "X1", "Y1", "X2", "Y2", "Width", 
"TransectDistance", "Area", "EZG_NR", "h1", "h2", "geometry"), row.names = c(NA, 
-1L), class = c("sf", "data.frame"), sf_column = "geometry", agr = structure(c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_), class = "factor", .Label = c("constant", 
"aggregate", "identity"), .Names = c("SiteId", "SampId", "MonitoringProgram", 
"X1", "Y1", "X2", "Y2", "Width", "TransectDistance", "Area", 
"EZG_NR", "h1", "h2")))

reach <- structure(list(FNODE_ = 148558, TNODE_ = 148142, LENGTH = 343.00790968, 
    ReachId = 125010L, geometry = structure(list(structure(c(688400.125, 
    688394.375, 688385.875, 688375.6875, 688363.625, 688359.8125, 
    688358.8125, 688361.125, 688375.6875, 688389.375, 688411.1875, 
    688432.875, 688445.1875, 688448.125, 688445.875, 688446.625, 
    688448.125, 688455.375, 688465, 171876.40625, 171890.40625, 
    171906.203125, 171916.703125, 171929.09375, 171944.703125, 
    171952.59375, 171959.90625, 171980.296875, 171997.09375, 
    172017.796875, 172041.90625, 172063.5, 172086.703125, 172109.90625, 
    172129, 172138.5, 172153.5, 172171.203125), .Dim = c(19L, 
    2L), class = c("XY", "LINESTRING", "sfg"))), class = c("sfc_LINESTRING", 
    "sfc"), precision = 0, bbox = structure(c(688358.8125, 171876.40625, 
    688465, 172171.203125), .Names = c("xmin", "ymin", "xmax", 
    "ymax"), class = "bbox"), crs = structure(list(epsg = NA_integer_, 
        proj4string = "+proj=somerc +lat_0=46.95240555555556 +lon_0=7.439583333333333 +k_0=1 +x_0=600000 +y_0=200000 +ellps=bessel +units=m +no_defs"), .Names = c("epsg", 
    "proj4string"), class = "crs"), n_empty = 0L)), .Names = c("FNODE_", 
"TNODE_", "LENGTH", "ReachId", "geometry"), row.names = 125010L, class = c("sf", 
"data.frame"), sf_column = "geometry", agr = structure(c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_), .Names = c("FNODE_", 
"TNODE_", "LENGTH", "ReachId"), .Label = c("constant", "aggregate", 
"identity"), class = "factor"))

# Transform to CRS WGS1984 (EPSG: 4326) and create 'sp' inputs for geosphere::dist2line()
site.sp <- st_transform(site, crs = 4326)
site.sp <- as_Spatial(st_geometry(site.sp))

reach.sp <- st_transform(reach, crs = 4326)
reach.sp <- as_Spatial(st_zm(st_geometry(reach.sp)))

# Calculate the location and distance of the nearest point along the polyline
d <- geosphere::dist2Line(site.sp, reach.sp)
d <- as.data.frame(d)

# Create simple feature with CRS WGS1984 (unprojected)
site.near <- st_as_sf(d, coords=c("lon", "lat"), crs=4326)

# Transform CRS from WGS1984 (EPSG: 4326) to CH1903/LV03
site.near <- st_transform(site.near, crs=st_crs(site))

# Check if the nearest point intersects the reach (FALSE)
st_intersects(site.near, reach, sparse = F)

The problem with this implementation is that (1) the nearest point along the reach does not intersect the reach polyline, and (2) it switches between the geosphere and sf packages, sp and sf object classes, and transforms between CRSs. In short, it doesn't work and its too complicated.

For these reasons, I would prefer to use only on the sf package to accomplish this entire task. I have some intuition that st_snap() would be useful here, but it always returns the same point as the input (the PostGIS and sf documentation pages don't really explain what the tolerance argument does).

EDIT As suggested by Spacedman's comment I used rgeos::gNearestPoints. It provided a the nearest point intersecting the reach for the site/reach I provided above, however for another site it provides a nearest point that is still a small distance from the reach:

> st_distance(site.near, reach)
Units: m
            [,1]
[1,] 3.81693e-09

I provide the site and reach below:

site <- structure(list(SiteId = "NAWA_2", SampId = "NAWA_2_2012-08-13", 
    MonitoringProgram = "NAWA", X1 = 613474, Y1 = 267088, X2 = 613603, 
    Y2 = 266936, Width = 21.87, TransectDistance = 199.361480732864, 
    Area = 4360.03558362773, EZG_NR = 91795, h1 = 74792, h2 = 79777, 
    geometry = structure(list(structure(c(613474, 267088), class = c("XY", 
    "POINT", "sfg"))), class = c("sfc_POINT", "sfc"), precision = 0, bbox = structure(c(613474, 
    267088, 613474, 267088), .Names = c("xmin", "ymin", "xmax", 
    "ymax"), class = "bbox"), crs = structure(list(epsg = NA_integer_, 
        proj4string = "+proj=somerc +lat_0=46.95240555555556 +lon_0=7.439583333333333 +k_0=1 +x_0=600000 +y_0=200000 +ellps=bessel +units=m +no_defs"), .Names = c("epsg", 
    "proj4string"), class = "crs"), n_empty = 0L)), .Names = c("SiteId", 
"SampId", "MonitoringProgram", "X1", "Y1", "X2", "Y2", "Width", 
"TransectDistance", "Area", "EZG_NR", "h1", "h2", "geometry"), row.names = c(NA, 
-1L), class = c("sf", "data.frame"), sf_column = "geometry", agr = structure(c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_), class = "factor", .Label = c("constant", 
"aggregate", "identity"), .Names = c("SiteId", "SampId", "MonitoringProgram", 
"X1", "Y1", "X2", "Y2", "Width", "TransectDistance", "Area", 
"EZG_NR", "h1", "h2")))

reach <- structure(list(OBJECTID_1 = 5381, FNODE_ = 15469, TNODE_ = 13378, 
    LPOLY_ = 0, RPOLY_ = 0, LENGTH = 3248.0468712, OBJECTID = 1517084, 
    OBJECTVAL = "Fluss", NAME = "Birs", UNTERIRDIS = NA_character_, 
    OBJECTORIG = "GN25", YEAROFCHAN = 2005, GEWISSNR = 443, LAUFNR = 0, 
    LINST = "CH", BACHNR = NA_character_, GWLNR = "CH0004430000", 
    Binary = 1L, Shape_Leng = 3248.0468712, ReachId = 5381L, 
    SiteId = "PF_00449", SampId = "PF_00449 _ 2014-09-25", MonitoringProgram = "ProgettoFiumi", 
    X1 = 613481.575730715, Y1 = 267350.715242873, X2 = NA_real_, 
    Y2 = NA_real_, Width = NA_real_, TransectDistance = NA_real_, 
    Area = NA_real_, EZG_NR = 91795, h1 = 74792, h2 = 79777, 
    SiteId.1 = "NAWA_2", SampId.1 = "NAWA_2_2012-08-13", MonitoringProgram.1 = "NAWA", 
    X1.1 = 613474, Y1.1 = 267088, X2.1 = 613603, Y2.1 = 266936, 
    Width.1 = 21.87, TransectDistance.1 = 199.361480732864, Area.1 = 4360.03558362773, 
    EZG_NR.1 = 91795, h1.1 = 74792, h2.1 = 79777, geometry = structure(list(
        structure(c(613680.112152525, 613672, 613574.625, 613541.625, 
        613535.1875, 613507.125, 613490.1875, 613476.1875, 613472.375, 
        613471.5, 613472.375, 613500.625, 613500.625, 613485.625, 
        613473.8125, 613432.1875, 266891.941045502, 266898.3125, 
        266968.59375, 267000, 267006.09375, 267039.8125, 267070.6875, 
        267106.3125, 267128.8125, 267151.3125, 267177.59375, 
        267405.3125, 267431.5, 267461.5, 267487.1875, 267578.3125
        ), .Dim = c(16L, 2L), class = c("XY", "LINESTRING", "sfg"
        ))), class = c("sfc_LINESTRING", "sfc"), precision = 0, bbox = structure(c(613432.1875, 
    266891.941045502, 613680.112152525, 267578.3125), .Names = c("xmin", 
    "ymin", "xmax", "ymax"), class = "bbox"), crs = structure(list(
        epsg = NA_integer_, proj4string = "+proj=somerc +lat_0=46.95240555555556 +lon_0=7.439583333333333 +k_0=1 +x_0=600000 +y_0=200000 +ellps=bessel +units=m +no_defs"), .Names = c("epsg", 
    "proj4string"), class = "crs"), n_empty = 0L)), .Names = c("OBJECTID_1", 
"FNODE_", "TNODE_", "LPOLY_", "RPOLY_", "LENGTH", "OBJECTID", 
"OBJECTVAL", "NAME", "UNTERIRDIS", "OBJECTORIG", "YEAROFCHAN", 
"GEWISSNR", "LAUFNR", "LINST", "BACHNR", "GWLNR", "Binary", "Shape_Leng", 
"ReachId", "SiteId", "SampId", "MonitoringProgram", "X1", "Y1", 
"X2", "Y2", "Width", "TransectDistance", "Area", "EZG_NR", "h1", 
"h2", "SiteId.1", "SampId.1", "MonitoringProgram.1", "X1.1", 
"Y1.1", "X2.1", "Y2.1", "Width.1", "TransectDistance.1", "Area.1", 
"EZG_NR.1", "h1.1", "h2.1", "geometry"), sf_column = "geometry", agr = structure(c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), .Names = c("OBJECTID_1", "FNODE_", "TNODE_", "LPOLY_", "RPOLY_", 
"LENGTH", "OBJECTID", "OBJECTVAL", "NAME", "UNTERIRDIS", "OBJECTORIG", 
"YEAROFCHAN", "GEWISSNR", "LAUFNR", "LINST", "BACHNR", "GWLNR", 
"Binary", "Shape_Leng", "ReachId", "SiteId", "SampId", "MonitoringProgram", 
"X1", "Y1", "X2", "Y2", "Width", "TransectDistance", "Area", 
"EZG_NR", "h1", "h2", "SiteId.1", "SampId.1", "MonitoringProgram.1", 
"X1.1", "Y1.1", "X2.1", "Y2.1", "Width.1", "TransectDistance.1", 
"Area.1", "EZG_NR.1", "h1.1", "h2.1"), .Label = c("constant", 
"aggregate", "identity"), class = "factor"), row.names = "5381.5", class = c("sf", 
"data.frame"))

Obtaining the nearest point along the reach gives:

site.near <- st_as_sf(rgeos::gNearestPoints(as(site,"Spatial"), as(st_zm(reach),"Spatial"))[2,])
> st_intersects(site.near, reach, sparse=FALSE)
      [,1]
[1,] FALSE
> st_distance(site.near, reach)
Units: m
            [,1]
[1,] 3.81693e-09

Using st_snap does not seem to work:

> site.snap <- st_snap(site.near, reach, tol=1e-9)
> st_intersects(site.near, reach, sparse=FALSE)
      [,1]
[1,] FALSE

Attempting to use maptools::snapPointsToLines also fails to intersect the nearest point to the reach:

> site.near <- st_as_sf(maptools::snapPointsToLines(as(site,"Spatial"), as(st_zm(reach),"Spatial")))
> st_intersects(site.near, reach, sparse=FALSE)
      [,1]
[1,] FALSE
> st_distance(site.near, reach)
Units: m
             [,1]
[1,] 3.269482e-11

Could this be an issue with numerical precision in R?

Best Answer

One approach is to convert the line into points and find the nearest point. Depending on trade-off between accuracy and processing time, you can select the optimal density of points to create.

Here, I create a point every 1m along the line. Use mapview to check.

library(mapview)
points <- st_line_sample(reach, density = 1/1) %>% 
  st_sf() %>%
  st_cast('POINT')
nearest <- points[which.min(st_distance(site, points)),]
mapview(nearest) + site + reach

enter image description here

Unfortunately even when points are created along a line using st_line_sample(), they do not intersect the line. I am guessing this is due to rounding. You might have to buffer the point slightly to intersect with the reach.

Going back to your original question (splitting the line at the nearest point), you could then recreate the two segments from the sampled points, like so:

points <- st_line_sample(reach, density = 1/1) %>% 
  st_sf() %>%
  st_cast('POINT') %>%
  mutate(group = 1)

which.point <- which.min(st_distance(site, points))

nearest <- points[which.point,]

segment1 <- points[1:which.point,] %>% 
  group_by(group) %>%
  summarise(do_union = FALSE) %>%
  st_cast("LINESTRING") %>%
  ungroup 

segment2 <- points[which.point:nrow(points),] %>% 
  group_by(group) %>%
  summarise(do_union = FALSE) %>%
  st_cast("LINESTRING") %>%
  ungroup 

mapview(segment1, color = 'red') + segment2 + nearest + site

enter image description here

Finally calculate length using st_length:

segment1 %<>% mutate(Length = st_length(.))
Related Question