Solved – how to determine medoids based on (dis)similarity matrix

clustering

Given the (dis)similarity matrix and the clustering results, how do I select the medoid in each cluster?

For example, one cluster contains totally 4 points: A, B, C, D. I know the similarity (or dissimilarity between each pair of them. How to pick one that is the most representative in this cluster?

My instinct is to choose the point with the minimum average distance to other points. I am not sure if this is correct.

I am using a linkage clustering method.

Things would be easier in k-means methods, but that is not what I need.

I would appreciate a lot if you can provide some links to codes along with the methods.

Best Answer

The medoid has a pretty clear definition.

It's the point with the smallest average distance to all other points.

(It obviously does not matter whether you use the average or the sum.)

If you have a similarity, maximize the similarity instead.

If you do something else, it's not the medoid anymore. For example I have seen a "k-medoids" program which computed the multivariate mean (like k-means) and then chose the nearest data point of this. Which essentially has the same limitations as k-means...

Related Question