Solved – How to define silhouette for one cluster

clusteringdata miningmachine learning

I want to compare two clustering algorithms. I took data that the first algorithm gathered in one cluster. The second algorithm gave 3 clusters for the same points. In order to compare the results, I tried to compute the two different silhouette for these algorithms, but I realized that in the first case, since there was only one cluster, there is no other cluster to compute the dissimilarities to. How can silhouette be defined in the case of one cluster? If it is not possible to define it, what would be a good way to compare the results of my two clustering methods?

Best Answer

I'm not sure why you are doing things this way. It seems to me that the first algorithm found initially 2 clusters (from where you chose one to be clustered by the 2nd algorithm). Why don't you cluster the whole data set with both algorithms and compare the clusterings?

Its very difficult to compare one cluster vs 3 clusters on the same data set. To have one clusters makes little sense, and one should reasonably expect the output of the K-Means criterion to be much smaller when you have 3 clusters (so you can't use this for comparison).

Now what's the silhouette of the 2nd algorithm generating 3 clusters? is it in average closer >=0.5? if not the meaning of this clustering is debatable.

Related Question