Solved – How to convert a dendrogram back into a distance matrix

dendrogramdistancer

Example code:

our_dist <- dist(USArrests[1:4,])
dend <- as.dendrogram(hclust(our_dist , "ave"))
plot(dend )

I would now wish to have a "dend2dist" function which will turn dend back to our_dist.

Of course, it is not possible to do in full (AFAIK), since the process of turning the dist into the dendrogram aggregated a lot of the data which can not be retrieved. But still, I would like for our "best guess" (assuming which aggregation method was used for the dendrogram), as to what is some distance which would be able to reproduce the original dendrogram.

And of course, tips on how to best implement such a function in R would also be nice.

Any suggestions will be most welcomed, thanks.

Best Answer

You can try to have a look at cophenetic distance. According to the R help page: "The cophenetic distance between two observations that have been clustered is defined to be the intergroup dissimilarity at which the two observations are first combined into a single cluster. Note that this distance has many ties and restrictions. It can be argued that a dendrogram is an appropriate summary of some data if the correlation between the original distances and the cophenetic distances is high. Otherwise, it should simply be viewed as the description of the output of the clustering algorithm. "

d1 <- dist(USArrests)
hc <- hclust(d1, "ave")
d2 <- cophenetic(hc)
cor(d1, d2) # 0.7659

Related Solutions

Solved – Given a 10D MCMC chain, how can I determine its posterior mode(s) in R

Have you considered using a nearest neighbour approach ?

e.g. building a list of the k nearest neighbours for each of the 100'000 points and then consider the data point with the smallest distance of the kth neighbour a mode. In other words: find the point with the 'smallest bubble' containing k other points around this point.

I'm not sure how robust this is and the choice for k is obviously influencing the results.

Solved – How to plot a fan (Polar) Dendrogram in R

In phylogenetics, this is a fan phylogram, so you can convert this to phylo and use ape:

library(ape)
library(cluster) 
data(mtcars)
plot(as.phylo(hclust(dist(mtcars))),type="fan")

Result:
alt text

Best Answer

Related Solutions

Solved – Given a 10D MCMC chain, how can I determine its posterior mode(s) in R

Solved – How to plot a fan (Polar) Dendrogram in R

Related Question