Machine Learning – Using the Gap Statistic to Compare Clustering Algorithms

clusteringdata miningmachine learning

I want to compare the performances of two clustering algorithms that give me different numbers of clusters. I recently learned about the gap statistic. However, from what I have learned, this statistic is used to find the optimal number of clusters for one algorithm (On that page, for example, it is used to find the best number of clusters for k-means). Is it possible to use it to compare which algorithm clusters gives the best performances? (finds the clustering that minimises the distance in clusters and maximized the distance between them)

Best Answer

Logically, the answer should be yes: you may compare, by the same criterion, solutions different by the number of clusters and/or the clustering algorithm used. Majority of the many internal clustering criterions (one of them being Gap statistic) are not tied (in proprietary sense) to a specific clustering method: they are apt to assess the clusters whatever method used. They just do not "know" if the being compared solutions with various number of clusters have come from the same or from different clustering methods.

However the majority of criterions should be applied to the same clustered dataset, unless a criterion value is well-thought standardized (which is not an easy task).


P.S. In their reasonable answer @Anony-Mousse raised an aspect which I had decided to hush up above.

By comparing different algorithms with a measure that correlates with some of the objective functions, you will be more likely measuring how similar the algorithm is to [that criterion], but not how good it actually works.

There exist no balanced or "universal" clustering criterions; any one of them bears some homology to the objective function of this and not that clustering algorithm, by virtue of which it tends to "prefer" one algorithm to another (and one shape of clusters to another, too). Gap statistic retains something of the K-means function, while Silhouette has clear trace of average linkage hierarchical method. They are not "orthogonal" to anything. A clustering criterion is itself an objective function of some clustering algorithm not invented exactly so, yet. An algorithm is good for us (in respect to cluster separability) if it wins when judged by the criterion we want. And it is unclear what else it might be that is the measure of how good it actually works (from the point of view of internal validation, I mean).