Solved – Should we most of the time use Ward’s method for hierarchical clustering

clusteringhierarchical clustering

By browsing notebooks on the web, I see that most of the time Ward's method is used for hierarchical clustering. What could explain its popularity? Does it mean that in general it performs better than the other methods?
Differents methods for Hierarchical Clustering

If we cut at $k=3$ the number of clusters, we see that in the 2D case the Single Linkage can cluster perfectly the first 2 datasets whereas Ward Linkage cannot. In the third dataset, I see that Ward's method is clearly superior to the others. What I suspect is that people assume most of the time real data to follow the geometry of dataset 3, thus explaining the popularity of Ward's method over the others.

Best Answer

Current version of that chart from sklearn:

current

Actually we should be using DBSCAN and OPTICS then?

All of these are toy examples and you'll find that none of them work on your real data, which is just a big mess and maybe does not have any clusters at all, has scaling problems, more than 2 dimensions and non-numerical attributes. And suddenly none of them work.

It is not correct that real data is always like some Gaussian. For example wealth is all but Gaussian distributed, nor is geography very Gaussian, nor are word frequencies Gaussian. Sometimes data has Gaussians, for example due to measurement error and aggregates, but I'd say it's more often not Gaussian than it is.