Is there a reason to prefer squaring or not squaring the dissimilarities when clustering with Ward's method?
The question is motivated by the following statement in the documentation for R's hclust()
function:
Two different algorithms are found in the literature for Ward clustering. The one used by option "
ward.D
" (equivalent to the only Ward option "ward
" in R versions <= 3.0.3) does not implement Ward's (1963) clustering criterion, whereas option "ward.D2
" implements that criterion (Murtagh and Legendre 2013). With the latter, the dissimilarities are squared before cluster updating.
Does squaring improve the algorithm?
Best Answer
From the Conclusion of Murtaugh, F. & Legendre, P. (2011). Ward's Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm, ArXive:1111.6285v2 (pdf):
For example,
hclust(dist(x)^2,method="ward")
is equivalent tohclust(dist(x),method="ward.D2")
.