Solved – Dynamic Time Warping vs Cross Correlation

correlationcross correlationinterpretationrsimilarities

Consider the following data frames:

> df1
  value
1     6
2     2
3     3
4     1
> df2
  value
1     3
2     8
3     4
4     5

I want to find the correlation between the data frames. When I use cross correlation function, I can easily interpret the results:

> ccf(df1, df2, lag.max = 0, plot = F)

Autocorrelations of series 'X', by lag

     0 
-0.643 

For example -0.643 indicates that there is a strong negative correlation between df1 and df2. But when I use DTW, it gives me the value 16 as a distance measure, and I do not know how to interpret it:

> align <- dtw(df1, df2, keep = T)
> align$distance
[1] 16

For example, I cannot say that there is a positive/negative, or weak/strong relationship between df1 and df2. So, here are my questions:

1) How can I interpret the results of DTW? (like negative/positive or weak/strong correlation)

2) We can use cross correlation function to compute correlation at different time lags, but I could not see any parameter like time lag in DTW:

> ccf(df1, df2, lag.max = 3, plot = F)

Autocorrelations of series 'X', by lag

    -3     -2     -1      0      1      2      3 
 0.000 -0.214  0.714 -0.643  0.286 -0.429  0.286 

Is there a built-in parameter like time lag in DTW?

Best Answer

Jbowman is spot on. In one of (if not the first) application of DTW, a reference speech signal was compared against a bank other speech signals, and the pairing with the lowest distance was a match, in a pretty accurate early speech recognition algorithm. So DTW can only really tell you how well a pair of signals match compared to another pair of signals in this application.

As an aside, if you're interested in velocimetry, which is what I use 2D DTW or cross correlation time delay estimation (CCTDE) for, which is why I'm familiar with both, then the warp path found in the DTW technique is actually the important thing, not the distance. As an extra aside for even moreinfo, velocimetry or distances can be measured with CCDTE by seeing what delay or separation of the signals produces the highest correlation.


Edit: Patris asked for an example of my second paragraph so I've edited to try and provide. The original (I say original, there may have been one earlier) paper that showed the use of DTW in speech recognition is by H. Sakoe and S. Chiba in 1978 Sakoe, H. & Chiba, S., IEEE Trans. Acoust., Speech, and Signal Proc (1978), 26: 1. This was for 1D speech signals and was expanded on by G. M. Quenot et al. to 2D images in 1998 Quénot, G., Pakleza, J. & Kowalewski, T. Experiments in Fluids (1998) 25: 177 (they also released a couple of papers in 1992 and 1996 that are referenced in this one, but this one is more complete, and they later released another in 2000 expanding to 3D, but I am less familiar with this one as it doesn't relate to my work). With 2D images we actually obtain a spatial displacement from one image to the other, but if we know the time difference between the two frames, then we also know the velocity.

CCTDE I am less familiar with the literature of - I know how it works and can implement it but my colleague uses that technique more than I do so I don't really need to know the origins of it quite as intimately but I will try to explain. Imagine you have two detectors such that measure light intensity at two different spatial locations, separated by distance $x$. If we move some light source, that we assume does not change in time, with velocity $v$ through the view of detector 1, then through the view of detector 2, we would have two different signals, one from each detector. By performing the cross-correlation of the two signals, and correctly normalising, we would have some value between -1 and +1.

Now if we displace signal 2 by some small amount, $\Delta t$, often the sampling period of our detectors, and cross correlate signal 1 with the temporally shifted signal 2, we will have another value between -1 and +1 for the cross-correlation. We do this for all values of $\pm \Delta t$ that still have some overlap between the two signals, so we have a range of values for the cross-correlation coefficient as a function of $\Delta t$. The value of $\Delta t$ that corresponds to the maxima is the estimate of time, $t$. As we know the distance between our detectors, and the estimate for the time taken for our light source to go from one detector to the either, we use $v = \frac{x}{\Delta t}$ to estimate our velocity.

CCDTE example

I've included an image from google to try and help show what I mean by the CCTDE. The image also includes this 'scaling factor' which would be equal to 1 if we assume our light signal is constant in time as I assumed at the beginning. This answer may also have gone slightly far away from the initial question, but I think it's interesting non-the less :)