See, even hierarchical clustering needs parameters if you want to get a partitioning out. In fact, hierarchical clustering has (roughly) four parameters: 1. the actual algorithm (divisive vs. agglomerative), 2. the distance function, 3. the linkage criterion (single-link, ward, etc.) and 4. the distance threshold at which you cut the tree (or any other extraction method).
Fact is that there doesn't exist any good "push button" solution to cluster analysis. It is an explorative technique, meaning that you have to try different methods and parameters and analyze the result.
I found DBSCAN to be very usable in most cases. Yes, it has two parameters (distance threshold aka: neighbor predicate, and minpts aka core predicate) - I'm not counting the distance function separately this time, because it's really a "is neighbor of" binary predicate that is needed; see GDBSCAN.
The reason is that in many applications you can choose these values intuitively if you have understood your data well enough. E.g. when working with Geo data, distance is literatlly in kilometers, and it allows me to intuitively specify the spatial resolution.
Similarly, minpts gives me an intuitive control over how "significant" a subset of observations needs to be before it becomes a cluster.
Usually, when you find DBSCAN hard to use, it is because you have not understood "distance" on your data yet. You then first need to figure out how to measure distance and what the resulting numbers mean to you. Then you'll know the threshold to use.
And in the end go and try out stuff. It's data exploration, not "return(truth);
". There is not "true" clustering. There are only "obvious", "useless" and "interesting" clusterings, and these qualities cannot be measured mathematically; they are subjective to the user.
The clustering itself is done using the Euclidean Distance - however
the dendrogram is depicted using the squared Euclidean Distance. They
don't explain ...
From the looks of the dendrogram one might suppose they have used Ward's linkage or something similar. It optimizes SSwithin and traditionally Y axis on the dendro with that method shows pooled SSwithin or squared distance, see. This link also warns against relying on the look of Ward's dendrogram like that.
Does that mean that the subjects in C1b have a different range
(bigger) of distances between each other than those in C1a?
Vertical branch length is the leap of decompression a group experience when it gets merged with some other group. But specific meaning of "decompression" depends on the linkage method.
Since the CHC is more and more decreasing here, does it make any sense
to chose more than 2 clusters?
To me, no. In this particular example, it is pretty obvious that the 2-cluster solution is the best, according to CH criterion. We, however, don't know if it is any better than 1-cluster solution (i.e. no clusters) - to check for that, I would recommend to plot the data to inspect visually; you might also want to use Gap criterion which, by simulations, can test for 1-cluster solution.
On the other hand, it is true that one should - in general case - pay attention also to potential sharp elbows on such plots, - not only to the peaks (or canyons), see. This is because clustering criterions (like CH) are difficult to "standardize": they have their own biases, including "biases" towards k of clusters. CH, for example, often prefers more clusters than, for instance, BIC criterion which penalizes for k.
Still, in your current example I can't think out a justification for authors' we defined stability (i.e. minimal change from one cluster number to the next) as our goal in deciding where to cut the dendrogram
without knowing their context, and the word stability looks to me strange here. I see no anything ragged / elbows on the plot except that between 2 and 3.
Internal clustering criterions such as CH are only one of several ways to select k or to validate clustering results.
Best Answer
The cophenetic correlation coefficient is defined as the linear correlation between the dissimilarities $d_{ij}$ between each pair of observations $(i,j)$ and their corresponding cophenetic distances $d_{ij}^{coph}$, which is the intergroup dissimilarity at which the observations $i, j$ first merged together in the same cluster.
So you get the cophenetic correlation coefficient $CCC$ by calculating the correlation between those values. Let $D$ be the distance matrix according to $d$ and $Z$ be the distance matrix according to $d^{coph}$, $\bar{D}, \bar{Z}$ denotes the means of $d_{ij}$ and $d_{ij}^{coph}$ respectively, then
$CCC(D,Z) = Cor(D,Z) = \frac{\sum\limits_{i<j} (D_{ij} - \bar{D})(Z_{ij} - \bar{Z}) }{\sqrt{\sum\limits_{i<j} (D_{ij} - \bar{D})^2 \sum\limits_{i<j} (Z_{ij} - \bar{Z})^2 }}$
(see: Mathworks Documentation: cophenetic correlation coefficient)
This should be equal to what you have done by calculating
So, I think your assumption is correct.