Actually, K-medoids (aka: PAM) can work with any kind of distance or similarity metric.
So do DBSCAN, OPTICS and Hierarchical clustering. The latter however is usually implemented in $O(n^3)$, so not an option if you have a lot of instances.
It depends on how literal you want your implementation to be.
Note that in the original PAM algorithm, the initialization BUILD
is the most important part. The optimization phase, SWAP
is expected to not require many iterations then.
From the original book, emphasis and comments added:
Note that as all potential swaps are considered [in each iteration], the results of the algorithm do not depend on the order of the objects in the input file [except for ties]
It was a conscious decision by the authors of the algorithm to search for the optimum swap, in order to become (mostly) order independent. A "greedy" variation (no longer PAM
though) would do the first swap that yields an improvement; this will likely be faster but no longer order independent.
As far as I can tell, ELKI includes such a variant by the name KMedoidsEM
:
Provides the k-medoids clustering algorithm, using a "bulk" variation of the "Partitioning Around Medoids" approach. In contrast to PAM, which will in each iteration update one medoid with one (arbitrary) non-medoid, this implementation follows the EM pattern. In the expectation step, the best medoid from the cluster members is chosen; in the M-step, the objects are reassigned to their nearest medoid. We do not have a reference for this algorithm. It borrows ideas from EM and PAM. If needed, you are welcome cite it using the latest ELKI publication (this variation is likely not worth publishing on its own).
Best Answer
The manhattan distance is based on absolute value distance, as opposed to squared error (read Eclidean) distance. In practice, you should get similar results most of the time. Absolute value distance should give more robust results, whereas Euclidean would be influenced by unusual values.
This is a multivariate technique, and "distance" between two points involves aggregating the distances between each variable. So if two points are close on most variables, but more discrepant on one of them, Euclidean distance will exagerate that discrepancy, whereas Manhattan distance will shrug it off, being more influenced by the closeness of the other variables.
According to Wikipedia, the k-medoid algorithm is not defined for Euclidean distance, which could explain why you have seen no examples of it. Presumably the reason for this is have a robust clustering method.
begin(RantMode)
Thoughtless analysts often throw a whole bag of variables into an analysis, not all of which have much to do with the problem at hand, nor do those analysts wish to take the necessary time to discern which variables matter -- possibly by talking to subject matter experts. Such analysts (who may possibly call themselves Big Data specialists) would naturally favour a technique that was robust with respect to choice of variable. Statisticians, traditionally, go for small amounts of quality data, and thus favour squared error methods with their greater efficiency.
end(RantMode)