R – Dissolving/Unifying Ill-Behaved Polygons in R

maptoolspolygonrrastersp

I'm working on a project involving bringing together multiple subjects circling difficult to find parts of an image. The output from users is pretty awesome.

Four classifications

Cool, right?

However, what I want is to get the area of the image that has 1, 2, 3, 4, etc. users selecting it. And I'm using R for the analysis with the sp, raster, and maptools packages (with a little rgeos throw in when need be). Essentially, my workflow is to
1) create a single polygon from each user's input (I actually have the track of each individual selection per user, but it seemed easier to combine them into a single SpatialPolygon per user)
2) merge them all together into a single SpatialPolygonsDataFrame Object
3) Rasterize the object using the count function
4) Assess area of the raster with n number of users selecting it

Here's the output SpatialPolygonsDataFrame to play with

This works great, but is slooow due to the rasterization. I feel like I should be able to do something with the SpatialPolygons object. And I've tried a few things with union, gIntersects, gUnion, and intersect. However, I keep getting errors such as the following

Error in RGEOSBinTopoFunc(spgeom1, spgeom2, byid, id, drop_lower_td, "rgeos_difference") : 
TopologyException: Input geom 1 is invalid: Self-intersection at or near point

Fair enough. Often a given users SpatialPolygons are invalid when I test them using gIsValid. Looking at, say, just the first two, this becomes obvious that there are many overlapping points.

Polygons one and two

You can see the intersecting points pretty clearly. Furthermore, in poly2 (the blue) there appears to be some self-intersection, causing gIsValid to return false.

I've also tried unionSpatialPolygons (with avoidRGEOS=T) which creates a unified object

unionSpatialPolygons(polysData[1:2,],
IDs=names(polysData[1:2,]@polygons))

But the borders still overlap like in the above image, rather than being one smooth polygon outer border – i.e., the polygons are not dissolved together. Then using any further functions on this new SpatialPolygons object, I have the same problems.

So, is there a way to use polygons instead of going to rasters? It sure would make my life faster, which would be excellent. I feel like this must be a common problem with a wheel invented that I have yet to find. Thoughts?

Best Answer

Without your original data, I can't be sure this will work, but I thought it might help you out. I didn't bring it all the way there, this solution still likely needs some level of automation, but might give you a general way forward

First, I create some spatial polygons

polypoints1 <- matrix(c(1,2,2,1,1,2,2,1,1,2),ncol=2)
polypoints2 <- matrix(c(1,3,3,1,1,3,3,1,1,3),ncol=2)
polypoints3 <- matrix(c(1,2,2,1,1,2,2,1,1,2)+1.1,ncol=2)
polypoints4 <- matrix(c(1,2,2,1,1,2,2,1,1,2)+0.5,ncol=2)

p1 <- Polygon(polypoints1)
ps1 <- Polygons(list(p1),1)
sps1 <- SpatialPolygons(list(ps1))

p2 <- Polygon(polypoints2)
ps2 <- Polygons(list(p2),2)
sps2 <- SpatialPolygons(list(ps2))

p3 <- Polygon(polypoints3)
ps3 <- Polygons(list(p3),3)
sps3 <- SpatialPolygons(list(ps3))

p4 <- Polygon(polypoints4)
ps4 <- Polygons(list(p4),4)
sps4 <- SpatialPolygons(list(ps4))

I plotted them just to see

plot(sps2,col='green')
plot(sps1,add=T,col='blue')
plot(sps3,add=T,col='yellow')
plot(sps4,add=T,col='purple')

I merged them into an spdf

data=data.frame(c(x=rep(1,4)),row.names=c(1:4))
sps <- SpatialPolygons(list(ps1,ps2,ps3,ps4))
spdf <- SpatialPolygonsDataFrame(sps,data)

You can identify which polygon overlaps which like so:

gIntersects(spdf,spdf,byid =T)

From the above command you could create some kind of loops to do the overlapping combinations below (I'm just ignoring sps4 for brevity at this point)

poly2a <- gIntersection(spdf[2,],spdf[1,],drop_lower_td=T)
poly2a <- SpatialPolygonsDataFrame(poly2a,data.frame(c(x=1),row.names=c(1)))
plot(poly2a,add=T,col='red')

This time we need to change the ID since we're going to rbind these later

poly2b <- gIntersection(spdf[2,],spdf[3,],drop_lower_td=T)
poly2b <- spChFIDs(poly2b,"2")
poly2b <- SpatialPolygonsDataFrame(poly2b,data.frame(c(x=1),row.names=c(2)))
plot(poly2b,add=T,col='red')

Merge the overlapping polygons into another spdf

spdf_overlaps <- rbind(poly2a,poly2b)
poly2 <- unionSpatialPolygons(spdf_overlaps,rep(1,2))
plot(poly2,add=T,col='blue')

Now we have poly2 which is where we have 2 layers overlapping (except combinations with sps4) then to figure out 3 layers, we just have to check out where poly2 and spdf overlap (if you make a more automated version of this, you'll need to make sure that 'poly2' does not include sps4 as in this example)

gIntersects(poly2,spdf[4,],byid =T)

poly3 <- gIntersection(poly2,spdf[4,],drop_lower_td=T)
plot(poly3,add=T,col="red") 

Check it out

gIsValid(poly2)
gIsValid(poly3)

Alternatively, you could always do a pseudo rasterization, much easier, but you loose some detail depending on your cell size:

First make the grid:

bb <- bbox(spdf)
cs <- c(0.1,0.1)  # cell size
cc <- bb[, 1] + (cs/2)  # cell offset
cd <- ceiling(diff(t(bb))/cs)  # number of cells per direction
grd <- GridTopology(cellcentre.offset=cc, cellsize=cs, cells.dim=cd)


sp_grd <- SpatialGridDataFrame(grd,
                           data=data.frame(id=1:prod(cd)))

Then, make grid into a polygon which used for overlap

library(Grid2Polygons)
grid <- Grid2Polygons(sp_grd)
plot(grid)

Then count the number of polygons that overlap each grid cell

count <- apply(gContains(spdf,grid,byid=T),1,sum)

Finally, plot it!

plot(grid)
for(i in 1:length(grid)){
    plot(grid[i,],col=rev(heat.colors(3))[count[i]],add=T)
}