Based on the discussion above, I further explored the respective literature and found a very suitable algorithm addressing my question. The method is capable of defining neighborhood through rows and columns in the grid rather than distances between pixel centroids. I can directly choose between Queen's and Rook's case contiguity. Distances do not have to be included as explanatory variables and then weighted until regions are contiguous. The algorithm restricts regions to be spatially contiguous but does not necessarily make spatial distance or coordinates optimization variables. This provides increased flexibility in terms of region shapes.
The method is called SKATER (Spatial ‘K’luster Analysis by Tree Edge Removal) and is nicely outlined in the respective journal article by Assuncao et al. (2006). The spdep
package implements the algorithm into R. The package documentation, though, is rather brief and unintuitive. Fortunately, Luc Anselin provides an illustrative tutorial on how to to apply spdep::skater()
to spatial polygons using French regions as an example. My subsequent code is an adaptation of his example to spatial data in raster structure.
As underlying data we use three raster layers, r1
, r2
and r3
. They are all of equal extent and equal projection (equal-area Mollweide projection). They document the three explanatory variables, say surface temperature, precipitation and elevation, based on which want to assemble k
regions.
# 1. Load packages
packs <- list("tidyverse", "raster", "spdep", "parallel")
lapply(packs, require, character.only = T)
# 2. Load raster layers using raster()
# 3. Define the number of regions (k + 1)
k <- 10
# 4. Merge explanatory variables in data frame
dat <- lapply(list(r1, r2, r3), values) %>%
do.call(cbind, .) %>%
as.data.frame(., stringsAsFactors = F) %>%
magrittr::set_colnames(c("Temperature", "Precipitation", "Elevation"))
# 5. Set up parallel framework for faster computation (optional)
# For a non-parallel, single core execution skip steps 5 and 13
ncores <- detectCores() - 1
cl <- parallel::makeCluster(ncores, type = "PSOCK")
set.coresOption(ncores)
set.ClusterOption(cl)
# 6. Standardize variables
sdat <- scale(dat) %>%
as.data.frame(., stringsAsFactors = F)
# 7. Create neighbor list object
raster_nb <- cell2nb(nrow(r1), ncol(r1), type = "queen") # you can alternatively set contiguity to "rook"
# 8. Subset cells
# There are various reasons for which you might need to exclude some pixels to avoid errors in subsequent functions
# One example is missing values in your raster layers (e.g. due to water bodies)
complete_pixels <- which(complete.cases(sdat))
raster_nb <- subset.nb(raster_nb, 1:length(raster_nb) %in% complete_pixels)
sdat <- sdat[complete_pixels,]
# 9. Calculate dissimilarity between neighboring cells
lcosts <- nbcosts(raster_nb, sdat)
# 10. Calculate spatial weights based on dissimilarity between neighbors
raster_w <- nb2listw(raster_nb, lcosts, style = "B")
# 11. Obtain minimum spanning tree
raster_mst <- mstree(raster_w)
# 12. Run skater clustering algorithm
skater_clusters <- skater(raster_mst[,1:2], sdat, k)
# 13. Close parallel framework
stopCluster(cl)
Best Answer
As @whuber says, I think you're going to need to relax your criteria, because it won't work as stands. One possible algorithm would be:
There are automated routines to do this sort of analysis in packages like ArcGIS or R, but processing time will be significant if you have a large number of points. You will probably have to experiment to get the best results, I don't think it's possible to lay down hard rules about clusters as you have above.