Solved – On cophenetic correlation for dendrogram clustering

classificationclustering

Consider the context of a dendrogram clustering. Let us call original dissimilarities the distances between the individuals. After constructing the dendrogram we define the cophenetic dissimilarity between two individuals as the distance between the clusters to which these individuals belong.

Some people consider that the correlation between the original dissimilarities and the cophenetic dissimilarities (called cophenetic correlation) is a "suitability index" of the classification. This sounds totally puzzling to me. My objection does not rely on the particular choice of the Pearson correlation, but on the general idea that any link between the original dissimilarities and the cophenetic dissimilarities could be related to the suitability of the classification.

Do you agree with me, or could you present some argument supporting the use of the cophenetic correlation as a suitability index for the dendrogram classification ?

Best Answer

... is a "suitability index" of the classification

To me it's not right clear what is meant by that. The way I got it, is that

the correlation between the original dissimilarities and the cophenetic dissimilarities (called cophenetic correlation)

is a measure of the hierarchical structure among the observations, i. e. their distances. That is to say the dissimilarities to observations in a different cluster are preferably similar. Considering to datasets A and B clustered using euclidean distance and complete linkage... enter image description here ...even without having a look at the cophenetic distance map or computing cophenetic correlation, one can see, that the cophenetic correlation of A is higher than that of B. In a hierarchy there are levels. So the CC tells about whether distances to observations on the same level (cluster) are similar.

For the sake of completeness: The cophenetic correlations are CC(A) = 0.936 and CC(B) = 0.691

Related Question