[GIS] Clustering geographical data based on point location and associated point values

clusteringgroupingpythonr

Given data points with longitude, latitude, and a third property value of this point. How can I cluster points into groups (geographical sub-regions) based on the property value? I searched by google and figured out that this problem seems to be called "spatial constrained clustering" or "regionalizing". However, I am not familiar with handling geographical data and haven't get an idea about what kind of algorithms are good, and which python/R packages are good for this task.

To give a more intuitive idea about what I want, let's say my data scatter plots are as following:

So each dot is a point, x is longitude, y is latitude, and colormap shows whether the value is big or small. I want to divide those points into sub regions/groups/clusters based on location and similarity of values. Like the following (it is not exactly what I want, just to show a intuitive idea.):

So how can I achieve this?

Best Answer

The rioja package provides functionality for constrained hierarchical clustering. For what your are thinking of as "spatially constrained" your would specify your cuts based on distance whereas for "regionalization" you could use k nearest neighbors. I would highly recommend projecting your data so it is in a distance based coordinate system.

require(sp)
require(rioja)

data(meuse)
  coordinates(meuse) <- ~x+y
  cdat <- data.frame(x=coordinates(meuse)[,1],y=coordinates(meuse)[,2])
  rownames(cdat) <- rownames(meuse@data)

# Constrained hierarchical clustering 
chc <- chclust(dist(cdat), method="conslink")

# Using kNN with 3 neighbors
chc.n3 <- cutree(chc, k=3) 

# Using distance 
chc.d200 <- cutree(chc, h=200) 

meuse@data <- data.frame(meuse@data, KNN=as.factor(chc.n3), DClust=chc.d200)

opar <- par
  par(mfcol=c(1,2))  
   cols <- topo.colors(length(unique(meuse@data$KNN)))  
    color <- rep("xx", nrow(meuse@data))
      for(i in 1:length(unique(meuse@data$KNN))) {
        v <- unique(meuse@data$KNN)[i] 
          color[(meuse@data$KNN == v)] <- cols[i]
        }
    plot(meuse, col=color, pch=19, main="kNN Clustering")
      box()

    cols <- topo.colors(length(unique(meuse@data$DClust)))  
    color <- rep("xx", nrow(meuse@data))
      for(i in 1:length(unique(meuse@data$DClust))) {
        v <- unique(meuse@data$DClust)[i] 
          color[(meuse@data$DClust == v)] <- cols[i]
        }
    plot(meuse, col=color, pch=19, main="Distance Clustering")
      box() 
par <- opar

Related Solutions

Raster Interpolation – Interpolating Temperature Values at Different Depth Intervals Using Krig Function in R

It is very difficult to help if you do not provide example data (as data, not by printing the data). But your approach does not look right to me. Why do you use rasterize? Here is a simplified workflow:

library(raster)
library(fields)

utm.prj = " +proj=utm +zone=21 +south +datum=WGS84 +units=m +no_defs "   
xy <- divetemps[, c("lon.x", "lat.y")]
rast <- raster(ext=extent(xy)+1000, crs=utm.prj, resolution = 500)

m <- fields::Krig(xy, divetemps$depthbin1)
surface <- raster::interpolate(rast, m)

plot(surface)

I think this is conceptually better, but you may still not like the output. I would expect that you would prefer the output of at Thin Plate Spline model:

m <- fields::Tps(xy, divetemps$depthbin1)
surface <- raster::interpolate(rast, m)

And you may consider to not use only x and y but also an additional variable such as elevation to predict temperature. See ?interpolate

Best Answer

Related Solutions

Raster Interpolation – Interpolating Temperature Values at Different Depth Intervals Using Krig Function in R

Related Question