Solved – How to interpret the numeric values for “height” in a dendrogram using Ward’s clustering method

dendrogramhierarchical clusteringward

I am a biology student investigating a new method of creating a dichotomous identification key. I have created a dendrogram using data I have collected from a survey on how people rate how similar pictures of plant leaves are. I used ward's method to link the clusters. In the resulting dendrogram, I have a y-axis that ranges between 0 and about 50. I know that this axis represents at which the objects are joined in a cluster, thus how far they are from other objects, but I was wondering what exactly does the numeric value represent?
enter image description here

Best Answer

I'm going to, ahem, go out on a limb here, ahem, and guess that you built your tree via the hclust function in base R with method = "ward.D2", which is Ward's original method. If you type ?hclust and look for height in the value (output) section, it says "The clustering height: that is, the value of the criterion associated with the clustering method for the particular agglomeration." In this case, Ward's criterion is the total within-cluster error sum of squares, which increases as you go up the tree and make the clusters bigger.

Related Question