I have a question regarding Partitioning Around Medoids (PAM) clustering algorithm, because everywhere I look, it is described differently. In every step of the algorithms do I swap only one medoid or more?
I mean, does the swapping step look like:
For each cluster medoid m in 1..K
Find the non-medoid point p in this cluster which, after swapping with medoid m, maximizes total similarity in this cluster (minimizes total distance)
Do the swap
Or is it like – find the swap between one of the medoids and non-medoid point (not-necessary inside that cluster) that maximizes total similarity in all dataset (minimizes total distance) and do the swap?
I will be very grateful for a reply
Best Answer
It depends on how literal you want your implementation to be.
Note that in the original PAM algorithm, the initialization
BUILD
is the most important part. The optimization phase,SWAP
is expected to not require many iterations then.From the original book, emphasis and comments added:
It was a conscious decision by the authors of the algorithm to search for the optimum swap, in order to become (mostly) order independent. A "greedy" variation (no longer
PAM
though) would do the first swap that yields an improvement; this will likely be faster but no longer order independent.As far as I can tell, ELKI includes such a variant by the name
KMedoidsEM
: