Solved – How to order kmeans clusters

clusteringdistancedistance-functionsk-meansr

I have a kmeans cluster object and I would like to order the clusters. Not the observations within the clusters, rather the clusters in order of each other.

Is there a way of doing this? I found gclus::order.clusters but did not follow what was happening.

Is there a conventional approach of ordering clusters?

set.seed(123)
myclustering <- kmeans(select(iris, -Species), centers = 3)
n <- 3
d <- dist(myclustering$centers)

 as.matrix(d)
         1        2        3
1 0.000000 5.017569 3.356935
2 5.017569 0.000000 1.797182
3 3.356935 1.797182 0.000000

I can see that cluster 1 is closer to cluster 3, cluster 2 closer to cluster 3 also, cluster 3 to 2.

But that's not what I'm seeking. I don't even know if what I'm seeking exists. In this example I'm picturing a 3D cartesian plane, one dimension for each of my 3 clustering variables. if I wanted to order these I guess I could order them in terms of distance from 0 where the points all meet, or perhaps from the lower left through to the upper right of the plane.

I hope I'm talking sense? My end goal is to provide the clusters to stakeholder not just in arbitrary order of 1,2,3,…k but by some measure.

Example, if I have 100K rows and cluster based on 20 variables, then using R's DPLYR I can group by cluster and provide the mean of each variable to understand what distinguishes each cluster. The data output would be ordered 1:20. What I'm seeking is to order the clusters by some measure so that the first cluster and last cluster output would be far from each other.

Is this common? Is there a conventional approach?

Best Answer

You may need some domain knowledge to come up with some names for the clusters (for example, active and rich customers, inactive and rich customers etc.), that is a better way to communicate to the stakeholders. Also, if you want to talk to domain expert, too many clusters (say ~20) would be hard.

On the other hand, if you have many clusters and want to check the structure on clusters, you may build another layer of hierarchical clustering on all cluster's center. The results will like this

How to interpret the dendrogram of a hierarchical cluster analysis