[GIS] Dissolve only overlapping polygons in R using sf

rsfunion

I extracted information from open street map that contains many polygons, some of them overlapping. I want to union all polygons that overlap with other polygons. This question has been asked before here: Dissolve only overlapping polygons in R but I am looking for an approach that uses the sf package.

Here is a reproducible example:

sq = function(pt, sz = 1) st_polygon(list(rbind(c(pt - sz), c(pt[1] + sz, pt[2] - sz), c(pt + sz), c(pt[1] - sz, pt[2] + sz), c(pt - sz))))
x = st_sf(box = 1:6, st_sfc(sq(c(4.2,4.2)), sq(c(0,0)), sq(c(1, -0.8)), sq(c(0.5, 1.7)), sq(c(3,3)), sq(c(-3, -3))))
plot(x)
st_overlaps(x)

What I am looking for is an appproach that unions the 2 rectangles that show overlap and the 3 rectangles that have overlap. Problem is that I do not know how to identify the 3 clusters of rectangles that I have in my example. With this information, I can run group_by %>% summarize() on the result and solve my problem.

enter image description here

Best Answer

Here's my sample data:

> plot(bdf)

enter image description here

If you only care about whether polygons overlap and how they form clusters, then use st_intersects:

> st_intersects(bdf,bdf)
Sparse geometry binary predicate list of length 6, where the predicate was `intersects'
 1: 1
 2: 2, 3, 4
 3: 2, 3, 4
 4: 2, 3, 4
 5: 5, 6
 6: 5, 6

That gets you some of the way. The output above can be used to form an adjacency matrix of overlaps, which you can then feed into the igraph package to create a graph, and then you can get the disconnected subgraphs.

> library(igraph)
> g = st_intersects(bdf,bdf)
> plot(graph_from_adj_list(g))

enter image description here

Find the connected components with components:

> G = graph_from_adj_list(g)
> components(G)
$membership
[1] 1 2 2 2 3 3

$csize
[1] 1 3 2

$no
[1] 3

The $membership element of that list is telling you which cluster each of the six original features is a member of, in order.

Alternatively if you do want to construct the geometry of the clusters, use st_union:

> parts = st_cast(st_union(bdf),"POLYGON")

The output of st_union is a "MULTIPOLYGON" of three, so I split it into its "POLYGON" parts and then test intersection with the original features:

> st_intersects(bdf, parts)
Sparse geometry binary predicate list of length 6, where the predicate was `intersects'
 1: 1
 2: 3
 3: 3
 4: 3
 5: 2
 6: 2

which tells me that original feature 1 is in cluster 1, features 2 to 4 are in cluster 3, and features 5 and 6 form cluster 2.

Note the order is different from the igraph example - the ordering of the polygons from the st_union is not the same as the ordering of the connected components in the igraph example. From lower-left to upper right st_union has indexed the clusters 1,3,2 and igraph has indexed them 1,2,3. This ordering is arbitrary (but don't mix them up).

Related Question