Exactly, as you state in the question and as @tdc puts in his answer, in case of extremely high dimensions even if the geometric properties of PCA remain valid, the covariance matrix is no longer a good estimate of the real population covariance.
There's a very interesting paper "Functional Principal Component
Analysis of fMRI Data" (pdf) where they use functional PCA to visualize the variance:
...As in other explorative techniques, the objective is that of providing an initial assessment that will give the data a chance “to speak for themselves” before an appropriate model is chosen. [...]
In the paper they explain how exactly they've done it, and also provide theoretical reasoning:
The decisive advantage of this approach consists in the possibility of specifying a set of assumptions in the choice of the basis function set and in the error functional minimized by the fit. These assumptions will be weaker than the specification of a predefined hemodynamic function and a set of events or conditions as in F-masking, thus preserving the exploratory character of the procedure; however, the assumptions might remain stringent enough to overcome the difficulties of ordinary PCA.
Neither PCA nor FDA are configured to answer that question. PCA and FDA transform the full data set to another set of the same dimension.
Intuitively, we imagine that the data depend on a small number of vectors, and that the rest of the variation in the sample is noise. However, if you attempt to formulate this intuition and solve it, what you get is factor analysis, not PCA.
Therefore, using PCA to reduce the dimension of the problem always relies on rules of thumb and ad hoc thinking. To me, I would look at the proportion of total variance explained. I would also look at the coefficients to see if they had an obvious and meaningful interpretation, and stop when the eigenvectors stop making sense.
There is a useful function in the psych package, fa.parallel, that uses a graphical method to determine the number of components for PCA and FA. Again, it's a rule of thumb, but it seems to produce sensible results most of the time.
I would expect the number of components selected for PCA to be the same, or similar, to the number of components selected for FDA. FDA is sort of like working with an oblique transformation of the basis of the data space, which shouldn't impact the underlying dimensionality of the problem.
Best Answer
The estimation of the mean function depends on the underlying FPCA implementation.
refund
uses spline smoothing (as it makes extensive use of the packagemgcv
), while other packages likefdapace
use locally weighted linear smoothers. A simple Python-based FPCA implementation I have found here also uses local-linear smoothing.Both approaches (i.e. using splines or locally weighted linear smoothers) are equally valid as they provide a non-parametric estimate of the mean trend.
fdapace
comes with a vignette that might come in as handy as a general blue-print on how to make an overall FPCA routine.As you make a particular hint to MATLAB/Python I would suggest looking at the MATLAB package PACE which is developed by the same people behind
fdapace
. The canonical "PACE" reference is Functional Data Analysis for Sparse Longitudinal Data by Yao, Mueller, and Wang (2005). I am unaware of any widely used Python FDA packages,In general, Functional Data Analysis involves a lot of smoothing. In many cases core practical differences between methodologies are actually different smoothing approaches.