Solved – n advantage to squaring dissimilarities when using Ward clustering

distance-functionshierarchical clusteringrward

Is there a reason to prefer squaring or not squaring the dissimilarities when clustering with Ward's method?

The question is motivated by the following statement in the documentation for R's hclust() function:

Two different algorithms are found in the literature for Ward clustering. The one used by option "ward.D" (equivalent to the only Ward option "ward" in R versions <= 3.0.3) does not implement Ward's (1963) clustering criterion, whereas option "ward.D2" implements that criterion (Murtagh and Legendre 2013). With the latter, the dissimilarities are squared before cluster updating.

Does squaring improve the algorithm?

Best Answer

From the Conclusion of Murtaugh, F. & Legendre, P. (2011). Ward's Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm, ArXive:1111.6285v2 (pdf):

Two algorithms, Ward1 and Ward2...When applied to the same distance matrix D, they produce different results. This article has shown that when they are applied to the same dissimilarity matrix D, only Ward2 minimizes the Ward clustering criterion and produces the Ward method. The Ward1 and Ward2 algorithms can be made to optimize the same criterion and produce the same clustering topology by using Ward1 with D-squared and Ward2 with D.

For example, hclust(dist(x)^2,method="ward") is equivalent to hclust(dist(x),method="ward.D2").