Solved – How to select the optimal number principal components in functional principal components analysis

functional-data-analysismultivariate analysisr

I am interested in the selecting the optimal number of principal components in functional principal component analysis (FPCA). There are various techniques to do that for example AIC, BIC etc. (I am not much familiar with all of those). One of them is to choose this parameter in such a way that 84% variance is explained. This can be done in R-package fda using pca.fd(fdobj, nharm = .......)
Here least value of nharm is selected such that sum of variation proportion which is calculated by varprop is just greater than or equal to 0.84. I have less knowledge about other methods. I have also gone through this link. But it does not serve my objective.

Your suggestions would be very helpful in this direction. I would happy to use R for this problem.

Best Answer

Neither PCA nor FDA are configured to answer that question. PCA and FDA transform the full data set to another set of the same dimension.

Intuitively, we imagine that the data depend on a small number of vectors, and that the rest of the variation in the sample is noise. However, if you attempt to formulate this intuition and solve it, what you get is factor analysis, not PCA.

Therefore, using PCA to reduce the dimension of the problem always relies on rules of thumb and ad hoc thinking. To me, I would look at the proportion of total variance explained. I would also look at the coefficients to see if they had an obvious and meaningful interpretation, and stop when the eigenvectors stop making sense.

There is a useful function in the psych package, fa.parallel, that uses a graphical method to determine the number of components for PCA and FA. Again, it's a rule of thumb, but it seems to produce sensible results most of the time.

I would expect the number of components selected for PCA to be the same, or similar, to the number of components selected for FDA. FDA is sort of like working with an oblique transformation of the basis of the data space, which shouldn't impact the underlying dimensionality of the problem.