I would like to cluster the cells of a raster object into k contiguous regions using kmeans. The number of regions, k, is known. Each cell has various geographical attributes, such as temperature, precipitation, elevation etc. And I need R to sort pixels into k groups (regions) using these cell values. With stats::kmeans()
that is a fairly simple exercise. Unfortunately, this method does not create spatially contiguous clusters. Instead, each group consists of pixels spread all over the grid.
I expected this to be a common problem, but could not find any R function that solves it. There should be some package out there that can perform clustering under the restriction of spatial contiguity.
The method of choice here is kmeans because I know how many groups I need. Any other technique from the field of unsupervised learning that allows me to a priori set this quantity, is of course also welcome.
I asked this question on Stack Overflow before and received a recommendation to also post it here, as GIS might be a better fit than SO.
Best Answer
Based on the discussion above, I further explored the respective literature and found a very suitable algorithm addressing my question. The method is capable of defining neighborhood through rows and columns in the grid rather than distances between pixel centroids. I can directly choose between Queen's and Rook's case contiguity. Distances do not have to be included as explanatory variables and then weighted until regions are contiguous. The algorithm restricts regions to be spatially contiguous but does not necessarily make spatial distance or coordinates optimization variables. This provides increased flexibility in terms of region shapes.
The method is called SKATER (Spatial ‘K’luster Analysis by Tree Edge Removal) and is nicely outlined in the respective journal article by Assuncao et al. (2006). The
spdep
package implements the algorithm into R. The package documentation, though, is rather brief and unintuitive. Fortunately, Luc Anselin provides an illustrative tutorial on how to to applyspdep::skater()
to spatial polygons using French regions as an example. My subsequent code is an adaptation of his example to spatial data in raster structure.As underlying data we use three raster layers,
r1
,r2
andr3
. They are all of equal extent and equal projection (equal-area Mollweide projection). They document the three explanatory variables, say surface temperature, precipitation and elevation, based on which want to assemblek
regions.