MATLAB: Does KMEANS return different results when invoked on the same input

clustersprocessingstatisticalStatistics and Machine Learning Toolboxuncertainty

When I run the following code multiple times, KMEANS returns different partitions (and hence a different vector s of within-cluster sums of point-to-centroid distances) although the data matrix a is the same:

   a = [0 -1 0 2 0] 
[b c s] = kmeans(a,2,'distance','cityblock')

Output 1:

Output2:

Best Answer

This is expected behavior because KMEANS by default selects the initial cluster centroid positions at random (albeit from the observations). That is, the value of the 'start' parameter is set to 'sample' as can be seen from the documentation. Another outcome you would also observe if you run your code several times is that KMEANS errors out because an empty cluster is created at the first iteration (i.e., b is all 1's or all 2's). You could always pass a matrix of initial positions as the value for the 'start' parameter, for example:

[b c s] = kmeans(a,2,'distance','cityblock','start',[0 1]')

This would yield the same result every time but since the partition returned by KMEANS highly depends on the initial centroid positions, you would probably get a sub-optimal partition (unless your provide a "lucky" vector for the 'start' parameter). The typical use of KMEANS entails setting the 'Replicates' parameter to an integer n corresponding to the number of times to repeat the clustering. KMEANS then returns the solution with the lowest value for s.

Best Answer

Related Solutions

MATLAB: How to avoid uncertainty in processing result of MATLAB Statistics Toolbox

MATLAB: Kmeans function give us give different answer

Related Question