Solved – Cluster data points by distance between clusters

clustering

I have a load of point data in 3D, and I wish to cluster them like so:

  • Each cluster contains points all of which are at most distance $d$ from another point in the cluster.
  • All points in two distinct clusters are at least distance $d$ from each other.

The distances used are Euclidean distances.

Is this a well known clustering algorithm? If so, what is is called? Thanks.

Best Answer

Create a graph in which the points are nodes and two points are connected with an edge if and only if they lie within distance $d$ of each other. Stated in these terms, your criteria become

  • Every node in a cluster of two or more nodes is connected to at least one other node in that cluster.

  • No two points in any disjoint clusters can be connected to each other.

In short, you want to compute the connected components of this graph. Linear-time algorithms are well known and simple to execute, as described in the Wikipedia article.

To do this efficiently for a lot of points in 3D, you want to limit the amount of distance calculation you perform. Data structures such as octrees, or even simpler ones (such as exploiting a lexicographic sorting of the points) work well.