MATLAB: KMEANS delivers different results on the same data set

kmeans

I'm performing a cluster analysis on financial time series. The distance measure is correlation.
IDX = kmeans(data',2,'distance','correlation')
The formula above delivers different results on the same set of time series. I’m wondering how this is possible.
Thanks for your help!

Best Answer

Christian, the kmeans functions uses a randomly-chosen starting configuration:
>> help kmeans
kmeans K-means clustering.
[snip]
'Start' - Method used to choose initial cluster centroid positions,
sometimes known as "seeds". Choices are:
'sample' - Select K observations from X at random (the default)
'uniform' - Select K points uniformly at random from the range
of X. Not valid for Hamming distance.
'cluster' - Perform preliminary clustering phase on random 10%
subsample of X. This preliminary phase is itself
initialized using 'sample'.
matrix - A K-by-P matrix of starting locations. In this case,
you can pass in [] for K, and kmeans infers K from
the first dimension of the matrix. You can also
supply a 3D array, implying a value for 'Replicates'
from the array's third dimension.
Like many optimizations, the K-Means algorithm can end up with different solutions for different starting points. You can take advantage of the randomness built into the kmeans function by running several replicates from different starting points:
'Replicates' - Number of times to repeat the clustering, each with a
new set of initial centroids. A positive integer, default is 1.
Hope this helps