Time-Series – Calculating Euclidean Distance for Multivariate Time Series: A Step-by-Step Guide

distancedistance-functionseuclideantime series

I would like to know how to use Euclidean distance to find similarity between two multivariate time series.

Suppose, I have two $N$-variate time series $u$ and $v$, with $u_i(t)$ denoting the $i$-th component of time series $u$ at time $t$.

Would it make sense to calculate distance between values of same parameters and then sum that distances for all parameters to get the final distance?

$$ d_i = \sqrt{(u_i(0)-v_i(0))^2 + (u_i(1)-v_i(1))^2 + … + (u_i(T)-v_i(T))^2}$$
$$ d = d_0 + d_1 + … + d_N$$

Or would it make sense to treat all the values equally, no matter which parameter is considered?

$$\check{d} = \sqrt{
\hphantom{+\,}(u_1(0)-v_1(0))^2 + (u_1(1)-v_1(1))^2 + … + (u_1(T)-v_1(T))^2 \\
+ (u_2(0)-v_2(0))^2 + (u_2(1)-v_2(1))^2 + … + (u_2(T)-v_2(T))^2 \\
+ … \\
+ (u_N(0)-v_N(0))^2 + (u_N(1)-v_N(1))^2 + … + (u_N(T)-v_N(T))^2
}$$

Or to consider distance between each record (values of each parameter at certain point in time):

$$ \hat{d}_0 = \sqrt{(u_1(0)-v_1(0))^2 + (u_2(0)-v_2(0))^2 + … + (u_m(0)-v_m(0))^2}$$

$$ \hat{d}_1 = \sqrt{(u_1(1)-v_1(1))^2 + (u_2(1)-v_2(1))^2 + … + (u_m(1)-v_m(1))^2}$$

$$ \hat{d}_T = \sqrt{(u_1(T)-v_1(T))^2 + (u_2(T)-v_2(T))^2 + … + (u_m(T)-v_m(T))^2}$$

$$ \hat{d} = \hat{d}_0 + \hat{d}_1 + … + \hat{d}_T$$

Or something else?

Example

Multivariate series 1:

power, current, voltage

100, 10, 10

400, 20, 20

900, 30, 30

Multivariate series 2

power, current, voltage

600, 20, 30

1000, 50, 20

450, 15, 30

What would the Euclidean distance look like?

Best Answer

When focussing on it as a comparitve measure, the difference between the three ways boils down to whether you sum the squares of the component-wise or record-wise distances (second method) or not (first and third method):

$$ \sqrt{d_0^2 + d_1^2 + … + d_N^2}\\ = \sqrt{\hat{d}_0^2 + \hat{d}_1^2 + … + \hat{d}_T^2}\\ = \check{d} \\ = \sqrt{ \hphantom{+\,}(u_1(0)-v_1(0))^2 + (u_1(1)-v_1(1))^2 + … + (u_1(T)-v_1(T))^2 \\ + (u_2(0)-v_2(0))^2 + (u_2(1)-v_2(1))^2 + … + (u_2(T)-v_2(T))^2 \\ + … \\ + (u_N(0)-v_N(0))^2 + (u_N(1)-v_N(1))^2 + … + (u_N(T)-v_N(T))^2 }$$

Thus, the main effect of the first approach is that the effect of large component-wise distances is attenuated, while with the third approach, the effect of large record-wise distances is attenuated (both in comparison to the second approach). The reason for this is that squaring before adding increases the relative impact of large summands. Whether this is desired or unwanted, depends on your application.

Either way, what you almost certainly want to do, is to normalise your component time series, as otherwise time series with comparably high values will dominate your measure. For instance, in your example, differences in power would dominate differences in the other quantities. As an illustration why this is bad: Your results would depend on your choice of units.

If the measure you use for your components are linearly scaling with what you are actually interested in (e.g., there is no general offset to one component), a reasonable normalisation would be to make all your component time series to have the same mean. Instead of using $u_i$, you use $\tilde{u}_i$ where

$$\tilde{u}_i(t) := \frac{u_i(t)}{μ_i}$$

and $μ_i$ is the mean over $u_i$ and $v_i$ (and possibly other comparable time series at your disposal).

Note that the standard procedure of subtracting the mean or dividing by the standard deviation may have undesired effects. For example, if a component is constant save for measurement errors, this procedure would blow up the error and let the differences of the measurement error play a considerable part in your analysis.

At the end of the day, what is a reasonable normalisation depends on your application and what you want to extract with your analysis.


Note that there are other ways to determine the similarity of time series that may be better suited to your application. For example, the cross-correlation would be a reasonable approach if you are not interested in differences arising due to linear transformations of an entire time series, i.e., you are only interested in a similar (in the geometric sense) temporal evolution. Again, what is best for you depends on your application.

And finally note that it does not make much sense to use a component that is derived from other components (I mention this because in your example, power always is the product of current and voltage).

Related Question