R Spatial Packages – sp::over vs sf::st_intersection

rsfsp

I was wondering why sf::st_intersection() takes too much time, in comparison with sp::over(); I use sf for pretty much all of my geospatial workflows (I was used to sp as well), and more and more packages are building support for sf objects; is there a way for speeding up st_intersection? According to st_intersection man page: "A spatial index is built on argument x", so I can't build a spatial index to speed it up.

In the following example eke is a point (sp or sf) object and ver_poly is a polygon object, even coerced to "Spatial", sp::over() goes faster (just using 10000 points, whole object is 150000 features):

> system.time(en_ver <- over(eke[1:10000,], as(ver_poly, "Spatial")))
   user  system elapsed 
  0.573   0.004   0.576 
> system.time(eke[1:10000,] %>% st_as_sf(coords = c("longitud", "latitud")) %>% st_set_crs(4326) %>%
+   st_intersection(ver_poly))
although coordinates are longitude/latitude, st_intersection assumes that they are planar
   user  system elapsed 
 83.552   0.004  83.585 

Best Answer

You are converting from sf to sp in the first, and from from sp to sf in the second - you should avoid timing those conversions.

sf is sometimes slow with points because of the way they are stored, but what you gain is far greater consistency than with sp, and usually faster ops.

Here I think it is comparable, but will depend on your actual data.

Here's a like-to-like comparison. You need st_intersects for the analog to what over is doing here.

library(raster)
#> Loading required package: sp
library(sf)
#> Linking to GEOS 3.7.0, GDAL 2.4.0, PROJ 5.2.0
sp_pts <- do.call(rbind, replicate(25, quakes, simplify = FALSE))

coordinates(sp_pts) <- c("long", "lat")
proj4string(sp_pts) <- CRS("+init=epsg:4326")
sf_pts <- st_as_sf(sp_pts)
sp_poly <- raster::rasterToPolygons(raster::raster(sp_pts))
sp_poly$layer <- 1:nrow(sp_poly)
sf_poly <- st_as_sf(sp_poly)


plot(sp_poly)
plot(sp_pts, add = TRUE, pch = ".")


 system.time({sp_ov <- over(sp_pts, sp_poly)})
#>    user  system elapsed 
#>   0.191   0.004   0.195
 system.time({sf_ov <- st_intersects(sf_pts, sf_poly)})
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
#>    user  system elapsed 
#>    0.21    0.00    0.21

 str(unlist(sp_ov$layer))
#>  int [1:25000] 38 37 59 28 38 39 1 68 68 27 ...
 str(unlist(unclass(sf_ov)))
#>  int [1:25000] 38 37 59 28 38 39 1 68 68 27 ...

Created on 2019-06-06 by the reprex package (v0.3.0)

Related Question