Solved – Hierarchical clustering, linkage methods and dynamic time warping

clusteringdistance-functionsmachine learningr

My goal is to cluster time series based on their DTW distance. Therefore I've calculated full distance matrices as input for several clustering algorithms. I first had a look at hierarchical methods, since the number of clusters don't have to be specified at the beginning (moreover k-means is problematic because of the problem of averaging time series under DTW and k-medoids is expensive).

Single linkage (not really useful), complete linkage and average linkage (UPGMA/WPGMA) are unproblematic methods, another criterion which seems to be often used is the Ward method (in R: ward.D2 for the hclust-function). I've seen at least one paper which uses the Ward method with DTW distances, however I am bit skeptical about the usage of Ward in this context. The ward distance of two clusters $A,B$ is defined, according to that paper, as:

$$D_{AB}=\frac{||c_A-c_B||^2}{1/|A|+1/|B|}$$

Where $c_A,c_B$ is the centroid of A, B.

My questions are:

  1. How the centroids are calculated if only a distance matrix is given? (I'm guessing this calculation can't be applied when DTW distances are used.)

  2. Since the application of the Ward method with DTW distances seems to be questionable, are there any other alternative (hierarchical) clustering methods, which can be used with a DTW distance matrix?

Best Answer

I think you are right to be suspicious of Ward's linkage with DTW dissimilarity based on the paper you referenced. In that paper the authors simply present the Ward's linkage criterion in terms of centroids without considering that the geometric notion of a centroid in a vector space will not coincide with the Frechet mean without a metric. To further complicate matters, consider that DTW allows for a similarity measurement between two vectors of different dimension (ie, time series of different lengths)! Long story short, the meaning of "centroid" in this context is not at all clear.

That said, Ward's linkage using a matrix of DTW dissimilarities is algorithmically possible and may give reasonable results. To answer your specific questions:

  1. It is likely that the authors in the paper you referenced are not directly computing the centroids (for reasons previously stated). Instead, algorithmic implementations of Ward's linkage criterion can work directly from a matrix of dissimilarities. In this paper (see section 2.2), for example, a method for computing the sample variance of the merging of two clusters from a distance matrix (in the case of the L2 metric) is presented. The authors further reference Legendre and Legendre (2012) for additional examples of how to cluster with Ward's linkage without directly computing centroids.

  2. Single, complete, average, and median linkage are all certainly reasonable, as these criteria theoretically depend on the the distance/dissimilarity measure itself rather than variance from a cluster centroid. Ward linkage also may give reasonable results in practice, although I would be cautious of relying on it exclusively because of the ambiguity surrounding the meaning of a centroid in the context of DTW similarity measures. Check out the answers and links at this question for more info on Ward's linkage in non-Euclidean spaces.