Solved – What does total ss and between ss mean in k-means clustering

clustering

I'm very new to cluster analysis. I'm using R for k-means clustering and I wonder what those things are. And what is better if their ratio is smaller or larger?

Best Answer

It's basically a measure of the goodness of the classification k-means has found. SS obviously stands for Sum of Squares, so it's the usual decomposition of deviance in deviance "Between" and deviance "Within". Ideally you want a clustering that has the properties of internal cohesion and external separation, i.e. the BSS/TSS ratio should approach 1.

For example, in R:

data(iris)
km <- kmeans(iris[,1:4], 3)

gives a BSS/TSS ratio of 88.4% (0.884) indicating a good fit. You should be careful tough, and it's usually a good idea to plot the WSS against the number of cluster, since this number has to be specified beforehand.