Solved – How to transform test data in Functional Principal Component Analysis in R

functional-data-analysispcar

I am doing a functional principal component analysis on time series data, and when I finished the FPCA on train data and extracted the PCs. Next, I need to project the test data onto the PCs, here I am frustrated how to carry out this process in R.

Here are my steps to deal with the time series data with fda package in R:

  1. Construct the basis functions. create.bspline.basis

  2. Smooth basis by smooth.basis

  3. FPCA by pca.fd on train dataset

Till the step 3, I gained the score and varprop, but I have no idea how to tranform the test dataset onto the same PCs as in train data.

Thanks for your help in advance.

Best Answer

Projecting new functional data using an existing FPCA analysis is very similar to what we would do with standard PCA (for multivariate data). The main difference is that due to stochastic nature of our sampling procedure we are unable to use standard numerical integration as we would in the case of PCA to get the corresponding score but rather a probabilistic approximation of it (PACE - see reference below).

For rest of the post I will refer to $\phi$ as the functional PCs, $\xi$ as the associated FPC scores, $\lambda$ as their associated eigenvalues, $\mu$ as the sample mean and $C$ as the sample covariance. I also assume we are dealing with irregularly spaced data across a continuum $s$ and I refer to the test data at hand as $y_{test}$. In short, the prediction for the trajectory $y_i(s)$ using the first $K$ eigenfunctions is: $\hat{y}_i^K(s) = \hat{\mu}(s) + \sum_{k=1}^{K} \hat{\xi}_{i,k}\hat{\phi}_k(s)$.

In order to project new test data on the results of an existing FPCA we would require the following steps:

  1. Ensure that $\mu$, $C$ and $\phi$ are evaluated at the same points of $s$ we have $y_{test}$ readings. If necessary, we estimate these values through interpolation techniques.
  2. Centre the data to have $E\{\mu(s)\}=0$ according the $\hat\mu(s)$ we calculated during the original FPCA.
  3. Predict the $\xi$ for the test data, using the fact that we expect the error of the prediction to be asymptotically Gaussian, through: $\hat{\xi}_{ik} = \hat{\lambda}_k \hat{\phi}_{ik}^T\hat{\Sigma}^{-1}_{y_i}(y_i^{obs} - \hat{\mu}_i)$. Notice that all estimates (aside $\hat{\lambda}_k$) are evaluated at the points we have observations from the $i$-th curve, i.e. they might even be just scalar in the odd case a particular sample has a single measurement. This whole procedure is what in the FDA literature is referred as the "PACE step/procedure" (PACE: Principal components Analysis through Conditional Expectation); the canonical reference on the matter is: Yao, et al. (2005) Functional Data Analysis for Sparse Longitudinal Data (Sect. 2.3 to be exact).

The package fdapace implements this methodology through the function predict.FPCA. The package fda (most probably) implements this methodology in the function project.basis but I have not used it.