Only single-linkage is optimal. Complete-linkage fails, so your intuition was wrong there. ;-)
As a simple example, consider the one-dimensional data set 1,2,3,4,5,6.
The (unique) optimum solution for complete linkage with two clusters is (1,2,3) and (4,5,6) (complete linkage height 2). The (unique) optimum solution with three clusters is (1,2), (3,4), (5,6), at linkage height 1.
There exists no agglomerative hierarchy that contains both, obviously, because these partitions do not nest. Hence, no hierarchical clustering contains all optimal solutions for complete linkage. I would even expect complete linkage to exhibit among the worst gap between the optimum solution and the agglomerative solution at high levels, based on that example... An this example likely is a counter example for most other schemes (except single linkage, where this data set degenerates).
You may be interested in studying this article:
Gunnar E. Carlsson, Facundo Mémoli:
Characterization, Stability and Convergence of Hierarchical Clustering Methods. Journal of Machine Learning Research 11: 1425-1470 (2010)
and the references contained therein, in particular also:
Kleinberg, Jon M. "An impossibility theorem for clustering." Advances in neural information processing systems. 2003.
These seem to indicate that given some mild stability requirements (which can be interpreted as uncertainty of our distance measurements), only single-linkage clustering remains. Which, unfortunately, is all but usable in most cases... so IMHO for practical purposes the authors overshoot a bit, neglecting usefulness of the result over pretty theory.
I think you are right to be suspicious of Ward's linkage with DTW dissimilarity based on the paper you referenced. In that paper the authors simply present the Ward's linkage criterion in terms of centroids without considering that the geometric notion of a centroid in a vector space will not coincide with the Frechet mean without a metric. To further complicate matters, consider that DTW allows for a similarity measurement between two vectors of different dimension (ie, time series of different lengths)! Long story short, the meaning of "centroid" in this context is not at all clear.
That said, Ward's linkage using a matrix of DTW dissimilarities is algorithmically possible and may give reasonable results. To answer your specific questions:
It is likely that the authors in the paper you referenced are not directly computing the centroids (for reasons previously stated). Instead, algorithmic implementations of Ward's linkage criterion can work directly from a matrix of dissimilarities. In this paper (see section 2.2), for example, a method for computing the sample variance of the merging of two clusters from a distance matrix (in the case of the L2 metric) is presented. The authors further reference Legendre and Legendre (2012) for additional examples of how to cluster with Ward's linkage without directly computing centroids.
Single, complete, average, and median linkage are all certainly reasonable, as these criteria theoretically depend on the the distance/dissimilarity measure itself rather than variance from a cluster centroid. Ward linkage also may give reasonable results in practice, although I would be cautious of relying on it exclusively because of the ambiguity surrounding the meaning of a centroid in the context of DTW similarity measures. Check out the answers and links at this question for more info on Ward's linkage in non-Euclidean spaces.
Best Answer
About your first question, it seems that the
mcquitty
option corresponds to WPGMA clustering, whileaverage
is for UPGMA. It is just by looking at the source code, so it is worth to double check it. But it also referred as is in theupgma()
function from the phangorn package.About your second question, I think you just have to subset your genes by the group labels found after
cutree
, and then plot expression profiles as usual.