Solved – Should we most of the time use Ward’s method for hierarchical clustering

clusteringhierarchical clustering

By browsing notebooks on the web, I see that most of the time Ward's method is used for hierarchical clustering. What could explain its popularity? Does it mean that in general it performs better than the other methods?

If we cut at $k=3$ the number of clusters, we see that in the 2D case the Single Linkage can cluster perfectly the first 2 datasets whereas Ward Linkage cannot. In the third dataset, I see that Ward's method is clearly superior to the others. What I suspect is that people assume most of the time real data to follow the geometry of dataset 3, thus explaining the popularity of Ward's method over the others.

Best Answer

Current version of that chart from sklearn:

current

Actually we should be using DBSCAN and OPTICS then?

All of these are toy examples and you'll find that none of them work on your real data, which is just a big mess and maybe does not have any clusters at all, has scaling problems, more than 2 dimensions and non-numerical attributes. And suddenly none of them work.

It is not correct that real data is always like some Gaussian. For example wealth is all but Gaussian distributed, nor is geography very Gaussian, nor are word frequencies Gaussian. Sometimes data has Gaussians, for example due to measurement error and aggregates, but I'd say it's more often not Gaussian than it is.

Best Answer

Related Solutions

Solved – Is it ok to use Manhattan distance with Ward’s inter-cluster linkage in hierarchical clustering

Solved – Can sub-optimality of various hierarchical clustering methods be assessed or ranked

Related Question