R – Resolving TopologyException Due to Self-Intersection

polygonrrgeosself-intersection

The 'TopologyException: Input geom 1 is invalid' self-intersection error which arises from invalid polygon geometries has been widely discussed. However, I haven't found a convenient solution on the web that solely relies on R functionality.

For instance, I have managed to create a 'SpatialPolygons' object from the output of map("state", ...) following Josh O'Brien's nice answer here.

library(maps)
library(maptools)

map_states = map("state", fill = TRUE, plot = FALSE)

IDs = sapply(strsplit(map_states$names, ":"), "[[", 1)
spydf_states = map2SpatialPolygons(map_states, IDs = IDs, proj4string = CRS("+init=epsg:4326"))

plot(spydf_states)

states

The problem with this widely applied dataset is now that self-intersection occurs at the point given below.

rgeos::gIsValid(spydf_states)
[1] FALSE
Warning message:
In RGEOSUnaryPredFunc(spgeom, byid, "rgeos_isvalid") :
  Self-intersection at or near point -122.22023214285259 38.060546477866055

Unfortunately, this problem prevents any further use of 'spydf_states', e.g. when calling rgeos::gIntersection. How can I solve this issue from within R?

Best Answer

Using a zero-width buffer cleans up many topology problems in R.

spydf_states <- gBuffer(spydf_states, byid=TRUE, width=0)

However working with unprojected lat-long coordinates can cause rgeos to throw warnings.

Here's an extended example that reprojects to an Albers projection first:

library(sp)
library(rgeos)

load("~/Dropbox/spydf_states.RData")

# many geos functions require projections and you're probably going to end
# up plotting this eventually so we convert it to albers before cleaning up
# the polygons since you should use that if you are plotting the US
spydf_states <- spTransform(spydf_states, 
                            CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96"))

# simplify the polgons a tad (tweak 0.00001 to your liking)
spydf_states <- gSimplify(spydf_states, tol = 0.00001)

# this is a well known R / GEOS hack (usually combined with the above) to 
# deal with "bad" polygons
spydf_states <- gBuffer(spydf_states, byid=TRUE, width=0)

# any bad polys?
sum(gIsValid(spydf_states, byid=TRUE)==FALSE)

## [1] 0

plot(spydf_states)

enter image description here