Solved – What to do with small cluster size after k-means

clusteringk-means

So I use kmeans to 10k data with k = 8 as I took it from elbow analysis that will suggest me 5-8 cluster

After the analysis, I got 1 cluster that only consist 1 member in it which I was not sure how I will see any pattern in it.

What should I do to that cluster? Should I ignore it (as elbow shows me that 7 cluster is ok) and go on with other 7?

Notes: I have tried all k from k=5 to 8 and validate it using silhouette coefficient. the results are not so far between each k.

Best Answer

There is nothing in the k-means objective that requires clusters to have the same number of elements - having a one-element cluster is perfectly fine if it minimizes variance.

Because of the squared errors, k-means is quite sensitive to outliers, and that is what you are seeing here supposedly: an outlier. It is not part of another cluster... And there may be more outliers that harm the k-means result...

don't use the elbow method!

It's nonsense, because the scales are not related. Modify the range of k and you'll interpret it differently. Also, most of the time there is no elbow anyway...