Solved – Dynamic Time Warping Clustering

clusteringtime series

What would be the approach to use Dynamic Time Warping (DTW) to perform clustering of time series?

I have read about DTW as a way to find similarity between two time series, while they could be shifted in time. Can I use this method as a similarity measure for clustering algorithm like k-means?

Best Answer

Do not use k-means for timeseries.

DTW is not minimized by the mean; k-means may not converge and even if it converges it will not yield a very good result. The mean is an least-squares estimator on the coordinates. It minimizes variance, not arbitrary distances, and k-means is designed for minimizing variance, not arbitrary distances.

Assume you have two time series. Two sine waves, of the same frequency, and a rather long sampling period; but they are offset by $\pi$. Since DTW does time warping, it can align them so they perfectly match, except for the beginning and end. DTW will assign a rather small distance to these two series. However, if you compute the mean of the two series, it will be a flat 0 - they cancel out. The mean does not do dynamic time warping, and loses all the value that DTW got. On such data, k-means may fail to converge, and the results will be meaningless. K-means really should only be used with variance (= squared Euclidean), or some cases that are equivalent (like cosine, on L2 normalized data, where cosine similarity is the same as $2 -$ squared Euclidean distance)

Instead, compute a distance matrix using DTW, then run hierarchical clustering such as single-link. In contrast to k-means, the series may even have different length.

Related Question