Solved – Multidimensional time series clustering

clusteringdistancetime series

I have unemployment rates and interest rates per country over time. I want to cluster the countries that have similar dynamics and levels in both dimensions together.

What could be a reasonable approach to measure the similarities between the countries?

Best Answer

If I understand well your problem you have $N$ countries described by 2 time series each, and you want to cluster your $N$ countries into $K$ groups.

For that you need a distance between these 2-variate time series such as an extended correlation.

In the multivariate time series literature, some people use a PCA-based approach: For each of your $N$ time series of $2 \times T$ values, you compute principal components, and then you use a distance between the principal components, cf. for instance this paper.

I would recommend to you a more "correlation"-based approach for your economic-flavoured time series: the copula approach.

First, differentiate your time series, i.e. consider the time series of variations $S_t = P_{t} - P_{t-1}$. Then, for each of your $N$ $2 \times T$ times series, you build an empirical copula, cf. for example this paper. Therefore, you end up with $N$ bivariate empirical copulas $C_i$, $1 \leq i \leq N$.

Then, you want to compare these bivariate emprical copulas two by two ($C_i$ and $C_j$) to perform a clustering, i.e. find a way to compare them. It can be done of many ways, one such is to use the Randomized Dependence Coefficient: it extracts a "correlation" coefficient between these two copula-representations of your time series.

Now that you can derive a distance between your $N$ bivariate time series of length $T$, you can run your favourite clustering algorithms.

Good luck!

Related Question