MATLAB: Kmeans appear to miss obvious clusters

clusteringkmeansStatistics and Machine Learning Toolbox

I am struggling to get kmeans to identify what appear to be fairly distinct clusters in my data. I've walked through the documentation and examples but can't improve over the images shown below (raw data plotted first followed by the kmeans result, data also attached). I've tried the different distance and start options without much success. Even giving seed values doesn't improve the clustering. Does anyone have any other suggestions I could try? My goal is to end up with each data point falling into one of 3 clusters. My last command was:
[cidx3,cmeans2] = kmeans(X,3,'dist','cosine','display','iter','Start',seeds);
where
seeds =
[0.018660 872 17.59;
0.002100 1140 18.88;
0.004652 1187 34.82]
Thank you

Best Answer

Do this (assuming there are no nan's in X):
[cidx3,cmeans2] = kmeans(zscore(X),3,'dist','cosine','display','iter');
Did it get better? If yes, look at your data again and think about what went wrong in your previous attempts. Look at the scales. Plot it using real scales 1:1. Think about how the cosine distance works when the data are shifted far away from zero.
Related Question