I have a load of point data in 3D, and I wish to cluster them like so:
- Each cluster contains points all of which are at most distance $d$ from another point in the cluster.
- All points in two distinct clusters are at least distance $d$ from each other.
The distances used are Euclidean distances.
Is this a well known clustering algorithm? If so, what is is called? Thanks.
Best Answer
Create a graph in which the points are nodes and two points are connected with an edge if and only if they lie within distance $d$ of each other. Stated in these terms, your criteria become
Every node in a cluster of two or more nodes is connected to at least one other node in that cluster.
No two points in any disjoint clusters can be connected to each other.
In short, you want to compute the connected components of this graph. Linear-time algorithms are well known and simple to execute, as described in the Wikipedia article.
To do this efficiently for a lot of points in 3D, you want to limit the amount of distance calculation you perform. Data structures such as octrees, or even simpler ones (such as exploiting a lexicographic sorting of the points) work well.