Solved – K-means gives non-spherical clusters

clusteringk-means

I am trying to cluster 24 month utilization behaviors of customers using sklearn/K-means in python. When I plot the customers by clusters in a 2-D space (Principal Components 1 and 2 of my 24-point values), I see some non-circular shapes too. There are a few points which definitely seem to be closer to another neighboring cluster than the one it has been assigned to. This despite K-means achieving convergence (algorithm stops before reaching max_iter). Can anyone explain if:

  1. Non circular clusters can be expected in a 2-D representation of K-means?
  2. How to explain examples in the plot where it seems customer should have been assigned to another cluster based on distance even though k-means is converging?

2D representation of K-means: The two axis are 1st and 2nd principal component of the K-means data and each cluster is a separate clusters

Best Answer

1) K-means always forms a Voronoi partition of the space. Thus it is normal that clusters are not circular.
2) K-means is not optimal so yes it is possible to get such final suboptimal partition. You will get different final centroids depending on the position of the initial ones.