Solved – Principal component analysis on time series : meaning

pcatime series

I understand how PCA works, as I practiced it in order to understand the ranking of some objects with respect to some variables. My question is about the extension of the PCA to the analysis of time series of correlated variables.

Having a look at Can PCA be applied for time series data?, I did some research about what is called Singular Spectrum Analysis (SSA), and more specifically Multivariate – SSA (M-SSA).

This is basically what I understood from all of that. Taking for example a set of P variables, observed on N different objects. The classical PCA allows you to extract through the first eigen vectors of the covariance matrix which one of the P variables are actually relevent to represent your set of N different objects.

Using SSA (or M-SSA) for time series of P variables, you replace the N "objects" by (N ?) lagged versions of your time series in order to extract through the first eigen vectors of P characteristic signals (of the length of the time serie) that represent most of you variable time series.

My question if the following : what kind of information do I extract if I replace directly the "N objects" by the time series ? Meaning that every observation would be on "object". In that case, I can apply directly a "classical PCA" which will give me eigen vectors of length P. What can of result is this ?

For example, does the coefficients associated with the P vairables, of the first eigen vector, tell me how much these variables are relevant to the study of these time series ?

I hope I could make it clear …

Best Answer

If I understand correctly, your question is about the reason to use MSSA for a system of time series, if one can apply PCA (or SVD) to this system.

The general answer is that the result of PCA is mostly an unstructured approximation (I mean from the viewpoint of the temporal structure), while SSA takes into consideration the temporal structure. Note that SSA is related to so-called SLRA (structured low-rank approximation).

The other (although there is a little point in this) answer is that if you have m time series of length N, m < N, then PCA provides only m component. For m=1, it is senseless to apply PCA; for m=2 two components can be insufficient even to try to decompose into trend, oscillations and noise.

A more clever example is related to decomposition to a signal and noise when the signal is described by a few SSA components (it is fulfilled if the time series is well approximated by a finite sum of products of polynomials, exponentials and sinusoids).

For example, let time series from the system consist of noisy sinusoids with some small Signal-to-Noise Ratio (SNR). PCA does not help to extract the signal for any time series length N. SSA applies SVD (PCA without centering/standardizing) to the trajectory matrix, which consists of lagged subseries of length L. For sufficiently large L and N, SSA is able to approximately extract the signal; for any SNR!

The same effect is for the case, when time series consist of temporal components like trend and oscillations. Direct approximation by PCA does not help to extract one of the components. SSA is able to do it due to the bi-orthogonality of the SVD. See SSA literature for a description of the 'separability' notion.

Thus, for times series, PCA usually does not work. The other important question, what is better, to apply MSSA to the system of time series or to apply SSA to each time series separately.