Solved – Hierarchical clustering: different result when I change labels

hierarchical clusteringr

I am running hierarchical clustering with a distance matrix M_norm:

hc <- hclust(M_norm^2, method="ward.D")
plot(hc, cex=1, hang=-1)

When I use different rownames and colnames in M_norm, the resulting dendrogram changes a little bit: heights where certain branches are joined are not the same as before. The height of the final join is also different.

The order of rows and columns in the input matrix is now different, but the distances between units are the same. I understand that the order of units at the bottom of the picture can change, but how can this happen? Is the implementation of this algorithm not deterministic?

Best Answer

I think I have found an answer here: http://r.789695.n4.nabble.com/hclust-does-order-of-data-matter-td3043896.html

Generally in hierarchical clustering the result can be ambiguous if there are several distances of identical value in the dataset (or identical between-cluster distances occur when aggregating clusters). The role of the order of the data depends on how these ambiguities are resolved.

Related Question