Dimensionality Reduction – How to Use SVD for Time Series of Different Lengths

data transformationmachine learningmultivariate analysispcatime series

I am using Singular Value Decomposition as a dimensionality reduction technique.

Given N vectors of dimension D, the idea is to represent the features in a transformed space of uncorrelated dimensions, which condenses most of the information of the data in the eigenvectors of this space in a decreasing order of importance.

Now I am trying to apply this procedure to time series data. The problem is that not all the sequences have the same length, thus I cant really build the num-by-dim matrix and apply SVD. My first thought was to pad the matrix with zeros by building a num-by-maxDim matrix and filling the empty spaces with zeros, but I'm not so sure if that is the correct way.

My question is how do you the SVD approach of dimensionality reduction to time series of different length? Alternatively are there any other similar methods of eigenspace representation usually used with time series?

Below is a piece of MATLAB code to illustrate the idea:

X = randn(100,4);                       % data matrix of size N-by-dim

X0 = bsxfun(@minus, X, mean(X));        % standarize
[U S V] = svd(X0,0);                    % SVD
variances = diag(S).^2 / (size(X,1)-1); % variances along eigenvectors

KEEP = 2;                               % number of dimensions to keep
newX = U(:,1:KEEP)*S(1:KEEP,1:KEEP);    % reduced and transformed data

(I am coding mostly in MATLAB, but I'm comfortable enough to read R/Python/.. as well)

Best Answer

There is a reasonably new area of research called Matrix Completion, that probably does what you want. A really nice introduction is given in this lecture by Emmanuel Candes

Related Question