[GIS] Creating groups of points from lat/long pairs using R

clusteringgeoprocessingpoint-of-interestr

I have a database which contains Lat/Long pairs to identify the location of points of interest. I would like to group the points of interest into groups of 10. The group should be geographically local and contain exactly 10 points. Each group should be of a minimum area.

I've looked at various implementations in R but none of them (that I can see) allow you to specify a definite cluster size.

I previously asked Grouping map points into fixed cluster sizes? but I don't think I was exact enough in my question to get a good answer.


Geographically local – I think I mean that groups shouldn't significantly overlap. In my application (allocating people to groups for monitoring purposes) it would be ideal if each group was as small as possible in physical area.
Minimum area – again, trying to keep the group area to a minimum. I suppose this could be quantified as keeping the each group's area below a specified threshold (to avoid dozens of small groups and one large one).

Best Answer

I think you might be looking for a k-nearest neighbor tool. This type of tool can be used to identify the 10 nearest neighbors of all points in your dataset. There seem to be a few different options for this (with some using different algorithms or having slightly different functionalities), and I'm not sure which would be the best option. But here are a few links:

http://stat.ethz.ch/R-manual/R-patched/library/class/html/knn.html http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/kNN

You may need to combine the results with a clustering algorithm or cluster ensemble tool to identify clusters of points that have similar sets of neighbors to get at classifications of points where there is little-to-no overlap. You may need to do a little bit of manual fiddling with the output, but it should allow you to automate a large portion of the work

Some links: http://jmlr.csail.mit.edu/papers/volume3/strehl02a/strehl02a.pdf http://cran.r-project.org/web/packages/clue/vignettes/clue.pdf

You might also be able to find a k-means clustering tool that will do this all in one step and enforce the 10 point in a cluster rule (just divide the total number of points by 10 and select that as the number of desired clusters for the tool).