Solved – Assumption of equal size of clusters in clustering

clusteringk-means

I am wondering: when clustering data using some general algorithm is there is an assumption on approximately equal sizes of the clusters?
For example, in k-means as I know all clusters should have approx. equal number of samples. Does it also hold for other clustering algorithms?

Best Answer

k-means does not care about cluster cardinalities

You are misunderstanding the common statement that k-means clusters "tend to be of the same size" (where size refers to the area, not cardinality). The latter is true to some extent, because k-means always splits the data on the middle orthogonal of the two clusters. This yields an approximately even division of the data space (at least if we ignore the infinite empty space outside your data - this is not mathematically rigorous).

However, if you have varying density in your data set (and why would you use clustering if you haven't) then two clusters of the same area do not have to have the same number of elements.

The only algorithm that I know which tries to ensure same cluster cardinality is this same-size-kmeans algorithm tutorial.