Solved – time series dimensionality reduction

dimensionality reductionforecastingsvdtime serieswavelet

I have a call center data (such as one below) that has call data collected every 15 minutes. For a day the periodicity is 96 and for a week the periodicity is (7 x 96 = 672). If I would like to forecast this data I can use models that handle multiple seasonality such as DSHW, TBATS and BATS in R forecast package. As you can imagine having a periodicity of 96 and 672 would require tremendous amount of computational resource and not very many techniques can handle such a long periodicity.

I recently came across an excellent article by Shen that uses Singular Value Decomposition to reduce the dimension/periodicity of the time series, so we could have a parsimonious model and the model would be more efficient/require less computational resource. I'm open to other methods such as discrete wavelet transformation (dwt), discrete Fourier transformation (dft) etc.,

    Time in mins                    
Date      0:15 0:20 0:45 1:00 1:15 1:30
1/1/2012    41  12  33  42  38  27
2/1/2012    50  12  20  14  27  46
3/1/2012    30  17  21  44  47  45
4/1/2012    26  15  10  13  50  38
5/1/2012    39  17  22  28  48  12
6/1/2012    30  10  25  30  21  21
7/1/2012    47  46  18  10  36  37
8/1/2012    26  37  41  36  37  12

Below are my questions:

  1. How to implement SVD/DWT/DFT based dimensional reduction for the aforementioned time series data using tools like R
    and matlab ?
  2. Are there any automatic procedures available for time series
    dimensional reduction ?

Best Answer

You might want to consider forecastable component analysis (ForeCA), which is a dimension reduction technique for time series, specifically designed to obtaina lower dimensional space that is easier to forecast than the original time series.

Let's look at an example of monthly sunspot numbers and for computational efficiency let's just look at the 20th century.

yy <- window(sunspot.month, 1901, 2000)
plot(yy)

sunspots monthly

Sunspot numbers are only a univariate time series $y_t = (y_1, \ldots, y_T)$, but we can turn this into a multivariate time series by embedding it its lagged $(p+1)$-dimensional feature space $X_t = \left(y_t, y_{t-1}, \ldots, y_{t-p}\right)$. This is a common technique in non-linear time series analysis.

XX <- embed(yy, 24)
XX <- ts(XX, end = end(yy), freq = 12)
dim(XX)

## [1] 1166   24

In R you can use the ForeCA package to do the estimation. Note that this requires the multivariate spectrum of a $K$-dimensional time series with $T$ observations, which is stored in a $T \times K \times K$ array (one can use symmetry/Hermitian property to half the size). Hence it takes considerably longer to compute than iid dimension reduction techniques (such as PCA or ICA).

So here we take the 24-dimensional time series of embedded sunspot numbers and try to find a 6-dimensional subspace that has interesting patterns that can be easily forecasted.

library(ForeCA)
# this can take several seconds
mod.foreca <- foreca(XX, n.comp = 4, 
                     spectrum.control = list(method = "wosa"))

mod.foreca

## ForeCA found the top 4 ForeCs of 'XX' (24 time series).
## Out of the top 4 ForeCs, 0 are white noise.
## 
## Omega(ForeC 1) = 53% vs. maximum Omega(XX) = 43%.
## This is an absolute increase of 9.9 percentage points (relative: 23%) in forecastability.
## 
## * * * * * * * * * * 
## Use plot(), biplot(), and summary() for more details.


plot(mod.foreca)

foreca results from sunspots

The biplot shows that the first component all points in the same direction, which is telling us that this component will be the overall/average pattern. The barplots on the right show how the forecastable components (ForeCs) have indeed decreasing forecastability, and the first component is more forecastable than the original series. In this example, all original series have the same forecastability, as we used the embedding. For general multivariate time series, this is not the case.

Now what do these series look like?

mod.foreca$scores <- ts(mod.foreca$scores, start = start(XX), 
                        freq = frequency(XX))
plot(mod.foreca$scores)

ForeCs

Indeed, the first component is more forecastable than the original series, since it is less noisy. The remaining series also show very interesting patterns, that are not visible in the original series. Note that all ForeCs are orthogonal to each other, i.e., they are uncorrelated.

round(cor(mod.foreca$scores), 3)

##        ForeC1 ForeC2 ForeC3 ForeC4
## ForeC1      1      0      0      0
## ForeC2      0      1      0      0
## ForeC3      0      0      1      0
## ForeC4      0      0      0      1

The spectrum of each series also gives a good idea of the different ForeCs [sic!] in sunspot activity.

spec <- mvspectrum(mod.foreca$scores, "wosa")
plot(spec)

spectrum of ForeCs