I have some points $X=\{x_1,…,x_n\}$ in $R^p$, and I want to cluster the points so that:
-
Each cluster contains an equal number of elements of $X$. (Assume that the number of clusters divides $n$.)
-
Each cluster is "spatially cohesive" in some sense, like the clusters from $k$-means.
It's easy to think of a lot of clustering procedures that satisfy one or the other of these, but does anyone know of a way to get both at once?
Best Answer
I suggest a two-step approach:
get a good initial estimates of the cluster centers, e.g. using hard or fuzzy K-means.
Use Global Nearest Neighbor assignment to associate points with cluster centers: Calculate a distance matrix between each point and each cluster center (you can make the problem a bit smaller by only calculating reasonable distances), replicate each cluster center X times, and solve the linear assignment problem. You'll get, for each cluster center, exactly X matches to data points, so that, globally, the distance between data points and cluster centers is minimized.
Note that you can update cluster centers after step 2 and repeat step 2 to basically run K-means with fixed number of points per cluster. Still, it will be a good idea to get a good initial guess first.