Solved – How to segment test data based on clustering run on training data- UNSUPERVISED

clusteringk-means

How can I segment the 'test' datasets based on k-means used in training datasets in R. For example, I run k-means clustering in R and get 7 segments from demographic and psychographic attributes of the customer in one state. Assigned the name to all the 7 segments based on the attributes present in each cluster. Now I want to segment other state data into the same 7 segments using previously run clustering on one state data? How to do this in R? Is it possible? Please do not confuse with supervised learning(Classification problem). My query is for unsupervised.

Best Answer

K-means produces a set of cluster centroids that approximately minimize the cost function. A cluster is defined as the set of points that share a common closest centroid. So, suppose you've trained k-means on some training set, and now want to cluster new points in a test set. Simply find the nearest centroid to each new point. Note that this applies to k-means, but not other clustering methods.

Geometrically, k-means induces a Voronoi partition on input space, with the centroids as seeds. All points within the Voronoi cell of a particular centroid are assigned to the same cluster. For example, in this image from Wikipedia, the black points would correspond to centroids, and all points in the corresponding colored regions (Voronoi cells) would be clustered together:

enter image description here