Given data points with longitude, latitude, and a third property value of this point. How can I cluster points into groups (geographical sub-regions) based on the property value? I searched by google and figured out that this problem seems to be called "spatial constrained clustering" or "regionalizing". However, I am not familiar with handling geographical data and haven't get an idea about what kind of algorithms are good, and which python/R packages are good for this task.
To give a more intuitive idea about what I want, let's say my data scatter plots are as following:
So each dot is a point, x is longitude, y is latitude, and colormap shows whether the value is big or small. I want to divide those points into sub regions/groups/clusters based on location and similarity of values. Like the following (it is not exactly what I want, just to show a intuitive idea.):
So how can I achieve this?
Best Answer
The rioja package provides functionality for constrained hierarchical clustering. For what your are thinking of as "spatially constrained" your would specify your cuts based on distance whereas for "regionalization" you could use k nearest neighbors. I would highly recommend projecting your data so it is in a distance based coordinate system.