I am trying to cluster 24 month utilization behaviors of customers using sklearn/K-means in python. When I plot the customers by clusters in a 2-D space (Principal Components 1 and 2 of my 24-point values), I see some non-circular shapes too. There are a few points which definitely seem to be closer to another neighboring cluster than the one it has been assigned to. This despite K-means achieving convergence (algorithm stops before reaching max_iter). Can anyone explain if:
- Non circular clusters can be expected in a 2-D representation of K-means?
- How to explain examples in the plot where it seems customer should have been assigned to another cluster based on distance even though k-means is converging?
Best Answer
1) K-means always forms a Voronoi partition of the space. Thus it is normal that clusters are not circular.
2) K-means is not optimal so yes it is possible to get such final suboptimal partition. You will get different final centroids depending on the position of the initial ones.