Solved – Is Partitioning Around Medoids (PAM) deterministic

clustering

I've been doing a lot of clustering work lately and have been using the PAM algorithm.

Based on my research it appears to be deterministic because the initialization of medoids are selected from items in the dataset (selected randomly). Further, subsequent medoids in the SWAP stage are also items from the dataset. Therefore, for any given dataset there is only one correct answer that minimizes the sum of the absolute distances to their cluster medoid.

Thus, PAM is an exhaustive search of every element for the optimal k medoids.

In comparison the k-means algorithm picks an arbitrary synthetic starting point for the cluster centers. The centers move until until the error is optimally reduced.

Am I correct in this assumption?

Best Answer

PAM is close to being deterministic, but there may be ties.

In particular, PAM does not use a random generator.

The heart of PAM is the BUILD phase that tries to smartly choose the initial settings (there exist variations that use a random sample, but IIRC that is not the original PAM algorithm). If I remember correctly, the authors even claimed that you don't really need the iterative refinement (SWAP phase), and it will finish in very few iterations because of the good starting conditions.

Nevertheless, if you have, e.g., a symmetric data set, you are likely to have more than one choice as "best medoid" at some point. Because of these "ties", it cannot be fully deterministic (most implementations will be deterministic because they do not randomly break these ties; but if you permute the data and have such ties, you may occasionally see different results).

PAM also is not exhaustive search. It is a steepest descent approach, but it will consider nearby solutions only. The hypergraph interpretation in the CLARANS article outlines this. But it easy to see that there are (n choose k) possible medoids, but PAM at any time only considers (n-k)*k alternatives in each SWAP step.