Solved – the benefit of using Manhattan distance for K-medoid than using Euclidean distance

clusteringdatasetk medoidsk-means

Please give me the reasons. I didn't find any k-medoid example that's calculation is done using Euclidean distance. All examples are made of Manhattan distance for k-medoid.

Best Answer

The manhattan distance is based on absolute value distance, as opposed to squared error (read Eclidean) distance. In practice, you should get similar results most of the time. Absolute value distance should give more robust results, whereas Euclidean would be influenced by unusual values.

This is a multivariate technique, and "distance" between two points involves aggregating the distances between each variable. So if two points are close on most variables, but more discrepant on one of them, Euclidean distance will exagerate that discrepancy, whereas Manhattan distance will shrug it off, being more influenced by the closeness of the other variables.

According to Wikipedia, the k-medoid algorithm is not defined for Euclidean distance, which could explain why you have seen no examples of it. Presumably the reason for this is have a robust clustering method.

begin(RantMode)

Thoughtless analysts often throw a whole bag of variables into an analysis, not all of which have much to do with the problem at hand, nor do those analysts wish to take the necessary time to discern which variables matter -- possibly by talking to subject matter experts. Such analysts (who may possibly call themselves Big Data specialists) would naturally favour a technique that was robust with respect to choice of variable. Statisticians, traditionally, go for small amounts of quality data, and thus favour squared error methods with their greater efficiency.

end(RantMode)