Solved – How to convert a dendrogram back into a distance matrix

dendrogramdistancer

Example code:

our_dist <- dist(USArrests[1:4,])
dend <- as.dendrogram(hclust(our_dist , "ave"))
plot(dend )

I would now wish to have a "dend2dist" function which will turn dend back to our_dist.

Of course, it is not possible to do in full (AFAIK), since the process of turning the dist into the dendrogram aggregated a lot of the data which can not be retrieved. But still, I would like for our "best guess" (assuming which aggregation method was used for the dendrogram), as to what is some distance which would be able to reproduce the original dendrogram.

And of course, tips on how to best implement such a function in R would also be nice.

Any suggestions will be most welcomed, thanks.

Best Answer

You can try to have a look at cophenetic distance. According to the R help page: "The cophenetic distance between two observations that have been clustered is defined to be the intergroup dissimilarity at which the two observations are first combined into a single cluster. Note that this distance has many ties and restrictions. It can be argued that a dendrogram is an appropriate summary of some data if the correlation between the original distances and the cophenetic distances is high. Otherwise, it should simply be viewed as the description of the output of the clustering algorithm. "

d1 <- dist(USArrests)
hc <- hclust(d1, "ave")
d2 <- cophenetic(hc)
cor(d1, d2) # 0.7659