[GIS] Calculating distances between consecutive points using R

rsf

I have a simple features collection of points, where each point represents the location of an individual. I'd like to calculate the distance between successive points in the cleanest and most efficient way possible.

library(dplyr)
library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.1.2, proj.4 4.9.3

df <- structure(list(timestamp = structure(
  c(1522540900, 1522540905,  1522540907, 1522540910, 1522540912, 1522540915,
    1522540917, 1522540920,  1522540922, 1522540925),
  class = c("POSIXct", "POSIXt"), tzone = ""),
  geometry = structure(list(
    structure(c(-84.50954, 33.89375), class = c("XY", "POINT", "sfg")),
    structure(c(-84.50954, 33.89378), class = c("XY", "POINT", "sfg")),
    structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg")),
    structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg")),
    structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg")),
    structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg")),
    structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg")),
    structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg")),
    structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg")),
    structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg"))),
    n_empty = 0L, precision = 0,
    crs = structure(list(epsg = NA_integer_, proj4string = NA_character_), class = "crs"),
    bbox = structure(
      c(xmin = -84.81364,  ymin = 33.54257, xmax = -83.84208, ymax = 34.24617), class = "bbox"),
    class = c("sfc_POINT",  "sfc"))),
  row.names = c(NA, -10L), class = c("sf", "tbl_df", "tbl", "data.frame"),
  sf_column = "geometry",
  agr = structure(c(timestamp = NA_integer_), .Label = c("constant", "aggregate", "identity"), class = "factor"))

df
#> Simple feature collection with 10 features and 1 field
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -84.81364 ymin: 33.54257 xmax: -83.84208 ymax: 34.24617
#> epsg (SRID):    NA
#> proj4string:    NA
#> # A tibble: 10 x 2
#>    timestamp                       geometry
#>    <dttm>                           <POINT>
#>  1 2018-04-01 00:01:40 (-84.50954 33.89375)
#>  2 2018-04-01 00:01:45 (-84.50954 33.89378)
#>  3 2018-04-01 00:01:47 (-84.50954 33.89377)
#>  4 2018-04-01 00:01:50 (-84.50954 33.89377)
#>  5 2018-04-01 00:01:52 (-84.50954 33.89377)
#>  6 2018-04-01 00:01:55 (-84.50954 33.89377)
#>  7 2018-04-01 00:01:57 (-84.50954 33.89377)
#>  8 2018-04-01 00:02:00 (-84.50954 33.89377)
#>  9 2018-04-01 00:02:02 (-84.50954 33.89377)
#> 10 2018-04-01 00:02:05 (-84.50954 33.89377)

sf::st_distance can return the matrix of distances between these points, or the element-wise distance between two different vectors. I have written a function to compute the successive distance by bootstrapping a dummy point on the end or beginning of the vector.

#' Calculate the successive distance between points
#' 
#' @param x The geometry column of an `sf` object
#' @return The distance between the `x[i]` and `x[i + 1]` elements of `x`
#' 
distance_between_points <- function(x) {
  null_point <- st_sf(x = st_sfc(st_point(x = c(0, 0))))

  x0 <- rbind(st_sf(x), null_point)
  x1 <- rbind(null_point, st_sf(x))

  d <- st_distance(x0, x1, by_element = TRUE) %>%
    .[-1]


  d[length(d)] <- NA
  d

}

distance_between_points(df$geometry)
#>  [1] 3e-05 1e-05 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00    NA

This works, but it's a bit slow. I think I was hoping for something more along the lines of the dplyr::lead function, like I can do for the timestamp column.

df %>%
  mutate(
    elapsed_time = lead(timestamp) - timestamp
  )
#> Simple feature collection with 10 features and 2 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -84.81364 ymin: 33.54257 xmax: -83.84208 ymax: 34.24617
#> epsg (SRID):    NA
#> proj4string:    NA
#> # A tibble: 10 x 3
#>    timestamp           elapsed_time             geometry
#>    <dttm>              <time>                    <POINT>
#>  1 2018-04-01 00:01:40 5            (-84.50954 33.89375)
#>  2 2018-04-01 00:01:45 2            (-84.50954 33.89378)
#>  3 2018-04-01 00:01:47 3            (-84.50954 33.89377)
#>  4 2018-04-01 00:01:50 2            (-84.50954 33.89377)
#>  5 2018-04-01 00:01:52 3            (-84.50954 33.89377)
#>  6 2018-04-01 00:01:55 2            (-84.50954 33.89377)
#>  7 2018-04-01 00:01:57 3            (-84.50954 33.89377)
#>  8 2018-04-01 00:02:00 2            (-84.50954 33.89377)
#>  9 2018-04-01 00:02:02 3            (-84.50954 33.89377)
#> 10 2018-04-01 00:02:05 <NA>         (-84.50954 33.89377)

df %>%
  mutate(
    elapsed_time = lead(timestamp) - timestamp,
    distance_to_next = sf::st_distance(geometry, lead(geometry))
  )
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'st_bbox' applied to an object of class "logical".

Best Answer

Is this simply:

> c(st_distance(df[-1,],df[-nrow(df),],by_element=TRUE),NA)
 [1] 3e-05 1e-05 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00    NA

That doesn't compute the full matrix so I don't think it can be any faster.