Solved – Representative point of a cluster with L1 distance

clusteringdistance-functionsk-meansMATLAB

The representative point of a cluster or cluster center for the k-means algorithm is the component-wise mean of the points in its cluster. The mean is chosen because it helps to minimize the within cluster variances (which is to say that it is minimizing within cluster squared Euclidean distance, since its the same). Now, when I want to use a different distance metric, like L1 (Manhattan distance), I would be minimizing the within cluster absolute deviations (if I choose to). I was looking at what the centers calculation would be like, because the mean won't reduce the new objective function, and I found this on the Matlab documentation of their kmeans function for the distance parameter

cityblock – Sum of absolute differences, i.e., the L1 distance. Each centroid is the component-wise median of the points in that cluster.

Now, how did they come up with the component-wise median? How does a median minimize absolute deviations?

Best Answer

The component-wise median can be used.

This is known as k-medians in clustering literature.

If you instead take the most central instance, you can use arbitrary distance functions. This is then known as k-medoids - where the medoid is the most central object of a cluster.

Related Question