Solved – Assigning class labels to k-means clusters

k-means

I have a very basic question on clustering. After I have found k clusters with their centroids, how do I go about interpreting the classes of the data points that I have clustered (assigning meaningful class labels to each cluster). I am not talking about validation of the clusters found.

Can it be done given a small labelled set of data points, compute to which cluster these labelled points belong to and based on type and number of points each cluster receives, decide the label? This seems pretty obvious but I don't know how standard it is to assigns labels to clusters this way.

To be clear, I want to perform unsupervised clustering that doesn't use any labels to first find my clusters. Then having found the clusters, I want to assign meaningful class labels to the clusters based on the properties of a few example datapoints.

Best Answer

Yes. What you propose is entirely standard and it is the way that standard k-means software works automatically. In the case of k-means you compute the euclidean distance between each observation (data point) and each cluster mean (centroid) and assign the observations to the most similar cluster. Then, the label of the cluster is determined by examining that average characteristics of the observations classified to the cluster relative to the averages of those relative to the other clusters.