Solved – Clustering (k-means, or otherwise) with a minimum cluster size constraint

clusteringr

I need to cluster units into $k$ clusters to minimize within-group sum of squares (WSS), but I need to ensure that the clusters each contain at least $m$ units. Any idea if any of R's clustering functions allow for clustering into $k$ clusters subject to a minimum cluster size constraint? kmeans() does not seem to offer a size constraint option.

Best Answer

Use EM Clustering

In EM clustering, the algorithm iteratively refines an initial cluster model to fit the data and determines the probability that a data point exists in a cluster. The algorithm ends the process when the probabilistic model fits the data. The function used to determine the fit is the log-likelihood of the data given the model.

If empty clusters are generated during the process, or if the membership of one or more of the clusters falls below a given threshold, the clusters with low populations are reseeded at new points and the EM algorithm is rerun.

Related Question