Solved – How to select a single updated centroid if multiple centroids are equidistant for a single group when running k-means/k-medoids

clusteringk medoidsk-meansmultiple-membership

I am trying to write my own k-means and k-medoids clustering algorithms. I understand the general idea: given k centroids, one continually updates the centroids such that the distance between the points and centroids is minimized; these distances can be euclidean, manhattan, etc. The centroids are members of the original dataset in the case of k-medoids, unlike k-means.

So say the algorithm is computing data-centroid distances before updating the centroids and repeating this process. Suppose there are two or more potential centroids for which the minimized distances are equal. Is there a criteria or general rule to pick which centroid to update?

This question is also posed here; the answer suggests using the originally assigned centroid to avoid infinite loops. This appears to me to be a reason based on programming errors as opposed to the actual method.

Best Answer

Ties (exact same distances) are not programming errors.

You could break them with a random generator, but that could cause an infinite loop in theory (or at least some extra iterations).

Or you just don't change anything in such cases, then you are fine.

Another option would be to always assign to the "first" cluster, which should also be stable.

It does not really make a difference as long as you make a deterministic decision.

Beware that the k-means style approach for k-medoids works quite poor.