Dynamic Time Warping – Dynamic Time Warping and Normalization in Time Series

functional-data-analysisnormalizationtime series

I'm using Dynamic Time Warping to match a "query" and a "template" curve and having reasonable success thus far, but I have some basic questions:

I'm assessing a "match" by assessing whether the DTW result is less than some threshold value that I come up with heuristically. Is this the general approach to determining a "match" using DTW? If not, please explain…

Assuming the answer to (1) is "yes", then I'm confused, since the DTW result is quite sensitive to a) the difference in amplitudes of the curves and b) the length of the query vector and the length of the "template" vector.

I am using a symmetric step function, so for (b) I'm normalizing my DTW result by dividing by M+N (width + height of DTW matrix). This seems to be somewhat effective, but it seems that it would penalize DTW matches that are further from the diagonal (i.e., which have a longer path through the DTW matrix). Which seems kind of arbitrary for a "normalization" approach. Dividing by the number of steps through the matrix seems to make intuitive sense, but that doesn't appear to be the way to do it according to the literature.
So is there a better way to adjust the DTW result for the size of the query and template vectors?
Finally, how do I normalize the DTW result for the difference in amplitudes between the query and the template vectors?

As it is, given the lack of reliable normalization techniques (or my lack of understanding), there seems to be a lot of manual effort involved in working with the sample data to identify the best threshold level for defining a "match". Am I missing something?

Best Answer

No "general approach" exists for this at least to my knowledge. Besides you are trying to minimize a distance metric anyway. For example in the granddaddy of DTW papers Sakoe & Chiba (1978) use $|| a_i - b_i||$ as the measurement of difference between two feature vectors.

As you correctly identified you need to have the same number of points (usually) for this to work out of the box. I would propose using a lowess() smoother/interpolator over your curves to make them of equal size first. It's pretty standard stuff for "curve statistics". You can see an example application in Chiou et al. (2003); the authors don't care about DTW as such in this work but it is a good exemplar how to deal with unequal sized readings.

Additionally as you say "amplitude" is an issue. This is a bit more open ended to be honest. You can try an Area-Under-the-Curve approach like the one proposed by Zhang and Mueller (2011) to take care of this but really for the purposes of time warping even sup-norm normalization (ie. replace $f(x)$ with $\frac{f(x)}{sup_y|f(x)|}$ could do as in this paper by Tang and Mueller (2009). I would follow the second, but in any case as you also noticed normalization of samples is a necessity.

Depending on the nature of your data you can find more application specific literature. I personally find the approach of minimizing with the respect to a target pairwise warping function $g$ the most intuitive of all. So the target function to minimize is: $C_\lambda(Y_i,Y_k, g) = E\{ \int_T (Y_i(g(t)) - Y_k(t))^2 + \lambda(g(t) -t)^2 dt| Y_i,Y_k\}$, where the whole thing despite it's uncanniness is actually quite straightforward: you try to find to find the warping function $g$ that minimizes the expected sum of the mismatch of the warped query curve $Y_i(g(t))$ to the reference curve $Y_k(t)$ (the term $ Y_i(g(t)) - Y_k(t) $) subject to some normalization to the time-distortion you impose by that warping (the term $g(t) -t$). This is what the MATLAB package PACE is implementing. I know that there exists an R package fda by J. O. Ramsay et al. that might be of help also but I have not personally used it (a bit annoyingly the standard reference for that package's methods is in many case Ramsay and Silverman's excellent book, Functional Data Analysis (2006) 2nd ed., and you have to scour a 400-page book to get what you look for; at least it's good read anyway)

The problem you are describing in Statistics literature is widely known as "curve registration" (for example see Gasser and Kneip (1995) for an early treatment of the issue) and falls under the general umbrella of Functional Data Analysis techniques.

(In cases I could find the original paper available on-line the link directs there; otherwise the link directs to a general digital library. Almost all the papers mentioned can be found to draft versions for free. I deleted my original comment as it is superseded by this post.)

Best Answer

Related Solutions

Solved – What’s the difference between Normalization and Standardization

Random Generation – Generating Random Matrices with Constraints on Row and Column Length

Related Question