[GIS] st_intersection failing for overlapping multipolygons in sf

rsf

I'm working with some user classification data on a citizen science platform. The way our workflow works is to take multiple user classifications of areas, overlay them on one another, and then look for consensus choice – see https://blog.floatingforests.org/2018/01/05/kelpy-consensus/ for context.

We're transitioning our codebase over to work with sf instead of sp in R for multiple reasons (one being that we've re-written the platform, so we have to re-write the pipeline anyway). As part of that rewrite, we're looking to just stay within the sf ecosystem in order to do out consensus operations instead of going back and forth to rasters (which we could using fasterize, but going from raster to polygon is a PITA). Inspired by this post on gis.stackexchange, I've been trying to use st_intersection, but getting odd results. Here's a reprex.

library(dplyr)
library(purrr)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3

#load the data
test_set <- devtools::source_gist("c95701bd444cda3e342838fd9a090fb3",
 filename = "test_set.R")$value

#visualize
plot(test_set)

So far so good, but when I try and get the intersection…

#get the intersection
overlap <- st_intersection(test_set)

I get the following error:

#> Error in CPL_nary_intersection(x): Evaluation error:
  TopologyException: Input geom 0 is invalid: Self-intersection at or
  near point 314406.13208707399 -5762352.8343122564 at 
  314406.13208707399 -5762352.8343122564.

This led me to find the "0 buffer" trick here, but…

#What about if we add a buffer?
test_set_buff <- test_set %>%
  mutate(geometry = map(geometry, ~st_buffer(.x, 0)))

overlap_buff <- st_intersection(test_set_buff)

which throws the following error (even if I up the buffer a bit)

Error in CPL_nary_intersection(x) : 
  Evaluation error: TopologyException: found non-noded intersection
 between LINESTRING (312816 -5.76168e+06, 312816 -5.76168e+06) and
  LINESTRING (312816 -5.76168e+06, 312816 -5.76168e+06) at 
  312815.57772721699 -5761684.4256249201.

I worried this might be due to invalid polygons, so, I did some digging and pulled up the lovely lwgeom

Sure enough, only two of the multipolygons were above. So, I attempted st_make_valid – which successfully made them all valid – but…

#Test set valid with lwgeom
test_set_valid <- lwgeom::st_make_valid(test_set) %>% st_cast("MULTIPOLYGON")

overlap_valid <- st_intersection(test_set_valid)

Which produces the same non-noded error. I fear that it might be due to some users making selections like

which have some overlap in them – but I'm unsure. It could be something else. Any suggestions for fixes here?

Update After reading this thorough thread, tried the following to both snap to grid and then make valid. Still the non-noded intersection issue.

#Snap to grid at 10m, as Landsat is only a 30m resolution anyway
test_set_snap <- test_set %>%
  mutate(geometry = map(geometry, ~lwgeom::st_snap_to_grid(.x, 10)))%>%
  mutate(geometry = map(geometry, lwgeom::st_make_valid))

overlap_snap <- st_intersection(test_set_snap)

Best Answer

This could be some error in the underlying geometry operations code. Here's a subset of your data containing five POLYGON objects which produces an error when intersected one way but not in the opposite order. Get the data into your current directory as p.R with this (warning, overwrites anything called p.R in your working directory):

download.file("https://gist.githubusercontent.com/barryrowlingson/4665509c3c00c65ccdc0cc73b4ff10dd/raw/2132f4caf3a4629569f2ccc0727b6f5c4180f743/p.R",destfile="p.R")

Then get sf and load it:

> library(sf)
Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3
> p = dget("./p.R")

Intersecting fails:

> st_intersection(p)
Error in CPL_nary_intersection(x) : 
  Evaluation error: TopologyException: found non-noded intersection between LINESTRING (313357 -5.75792e+06, 313357 -5.75792e+06) and LINESTRING (313357 -5.75792e+06, 313357 -5.75792e+06) at 313357.29442873126 -5757920.173200381.

But in the other order it works:

> st_intersection(p[5:1])
Geometry set for 12 features 
geometry type:  GEOMETRY
dimension:      XY
bbox:           xmin: 312547.9 ymin: -5760210 xmax: 313696.4 ymax: -5757599
epsg (SRID):    32721
proj4string:    +proj=utm +zone=21 +south +datum=WGS84 +units=m +no_defs
First 5 geometries:
MULTIPOLYGON (((312862.8 -5760108, 312778.8 -57...
MULTIPOLYGON (((313408.8 -5758713, 313408.8 -57...
MULTIPOLYGON (((313534.7 -5758302, 313534.7 -57...
MULTIPOLYGON (((313680.5 -5757599, 313696.4 -57...
POLYGON ((313303.8 -5759322, 313282.8 -5759322,...

I'm pretty sure st_intersection operations with one parameter should be invariant to the order of the input features, which makes me think this is a bug.

Related Solutions

[GIS] conditional st_join and st_intersection in R (sf)

I can only suggest you loop over each feature and compute the intersection with all the features not considered so far:

nr = nrow(fr1)
is1 <- do.call(rbind,
       lapply(1:(nr-1),
          function(i){
           st_intersection(fr1[i,], fr1[-(1:i),])
           }))

> dim(is1)
[1] 43 27

which is the same as your code.

Whether this method is faster or not I don't know - would need to benchmark it under a variety of conditions.

R Spatial Packages – sp::over vs sf::st_intersection

You are converting from sf to sp in the first, and from from sp to sf in the second - you should avoid timing those conversions.

sf is sometimes slow with points because of the way they are stored, but what you gain is far greater consistency than with sp, and usually faster ops.

Here I think it is comparable, but will depend on your actual data.

Here's a like-to-like comparison. You need st_intersects for the analog to what over is doing here.

library(raster)
#> Loading required package: sp
library(sf)
#> Linking to GEOS 3.7.0, GDAL 2.4.0, PROJ 5.2.0
sp_pts <- do.call(rbind, replicate(25, quakes, simplify = FALSE))

coordinates(sp_pts) <- c("long", "lat")
proj4string(sp_pts) <- CRS("+init=epsg:4326")
sf_pts <- st_as_sf(sp_pts)
sp_poly <- raster::rasterToPolygons(raster::raster(sp_pts))
sp_poly$layer <- 1:nrow(sp_poly)
sf_poly <- st_as_sf(sp_poly)


plot(sp_poly)
plot(sp_pts, add = TRUE, pch = ".")


 system.time({sp_ov <- over(sp_pts, sp_poly)})
#>    user  system elapsed 
#>   0.191   0.004   0.195
 system.time({sf_ov <- st_intersects(sf_pts, sf_poly)})
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
#>    user  system elapsed 
#>    0.21    0.00    0.21

 str(unlist(sp_ov$layer))
#>  int [1:25000] 38 37 59 28 38 39 1 68 68 27 ...
 str(unlist(unclass(sf_ov)))
#>  int [1:25000] 38 37 59 28 38 39 1 68 68 27 ...

^{Created on 2019-06-06 by the reprex package (v0.3.0)}

Best Answer

Related Solutions

[GIS] conditional st_join and st_intersection in R (sf)

R Spatial Packages – sp::over vs sf::st_intersection

Related Question