Solved – Time series clustering

clusteringrtime series

I have many time series in this format 1 column in which I have date (d/m/yr) format and many columns that represent different time series like here:

DATE         TS1     TS2     TS3 ...
24/03/2003   0.00    0.00    ...
17/04/2003  -0.05    1.46
11/05/2003   0.46   -3.86
04/06/2003  -2.21   -1.08
28/06/2003  -1.18   -2.16
22/07/2003   0.00    0.23

With R, how I can group the time series that show similar trends?

Best Answer

Step 1

Perform a fast Fourier transform on the time series data. This decomposes your time series data into mean and frequency components and allows you to use variables for clustering that do not show heavy autocorrelation like many raw time series.

Step 2

If time series is real-valued, discard the second half of the fast Fourier transform elements because they are redundant.

Step 3

Separate the real and imaginary parts of each fast Fourier transform element.

Step 4

Perform model-based clustering on the real and imaginary parts of each frequency element.

Step 5

Plot the percentiles of the time series by cluster to examine their shape.

Alternately, you could omit the DC components of the fast Fourier transform to avoid your clusters being based on the mean and instead on the series defined by the Fourier transform, which represents the shape of the time series.

You will also want to calculate the amplitudes and phase angles from the fast Fourier transform so that you can explore the distribution of time series spectra within clusters. See this StackOverflow answer on how to do that for real-valued data.

You could also plot the percentiles of time series shape by cluster by computing the Fourier series from the amplitudes and phase angles (the resulting time series estimate will not perfectly match the original time series). You could also plot the percentiles of the raw time series data by cluster. Here is an example of such a plot, which came about from a harmonic analysis of NDVI data I just did today:

1st, 25th, 50th, 75th, and 99th percentiles of period-level NDVI measures by clusters derived from model-based clustering using Mclust package in R

Finally, if your time series is not stationary (i.e., mean and variance shift over time), it may be more appropriate to use a wavelet transform rather than a Fourier transform. You would do so at the cost of information about frequencies while gaining information about location.