I have been trying to carry out Principal Component Analysis (PCA) in R using the function psych::principal
. However the returned $p$-value has been very low in several attempts using different sets of variables.
Does it mean that the test of the hypothesis that the number of components is sufficient is rejected? (The number of components was obtained using the scree plot and the criterion of eigenvalue larger than 1.)
Is it unimportant for the purpose of obtaining components using PCA? If not, what can be done?
RC1 RC2 RC4 RC3 RC5
SS loadings 5.19 4.24 3.01 2.63 1.16
Proportion Var 0.27 0.22 0.16 0.14 0.06
Cumulative Var 0.27 0.50 0.66 0.79 0.85
Proportion Explained 0.32 0.26 0.19 0.16 0.07
Cumulative Proportion 0.32 0.58 0.77 0.93 1.00
Test of the hypothesis that 5 components are sufficient.
The degrees of freedom for the null model are 171 and the objective function was 23.32
The degrees of freedom for the model are 86 and the objective function was 2.64
The total number of observations was 103 with MLE Chi Square = 241.11 with prob < 1.2e-16
Thank you very much for the answers and input! All help are very much appreciated! I look forward to more detailed documentation of the package to be available.
Best Answer
Goodness of fit cannot be meaningfully computed for PCA. The
principal
function is doing something strange here.Function
principal()
is part of thepsych
package that implements factor analysis, and is designed to implement PCA in a manner mimicking factor analysis. In factor analysis, goodness of fit can be meaningfully computed as follows.Given a $p\times p$ covariance matrix $\mathbf S$, factor analysis fits a $p\times k$ matrix of loadings $\mathbf W$ where $k$ is the number of factors and a diagonal $p\times p$ matrix of uniquenesses $\boldsymbol \Psi$ such that $$\mathbf S \approx \mathbf C = \mathbf W \mathbf W^\top + \boldsymbol \Psi.$$ There are different methods of "factor extraction", differing in how exactly the distance between $\mathbf C$ ans $\mathbf S$ is measured. Maximum likelihood method finds loadings and uniquenesses maximizing log-likelihood of the observed data. The log-likelihood of the data is given by $$L_k=-\frac{n}{2}\left[p\ln(2\pi) + \ln|\mathbf C| + \mathrm{tr}(\mathbf C^{-1}\mathbf S)\right],$$ where $n$ is the number of data points (observations). Given two models with different number of factors $k_1$ and $k_2$, they can be compared via a likelihood ratio test that tells us that $$2(L_{k_1} - L_{k_2}) \sim \chi^2_{\mathrm{df}_1-\mathrm{df}_2}.$$ In particular, a model with $k$ factors can be compared to the "full" model with $k=p$ and $\mathbf C=\mathbf S$.
In PCA, there is no $\boldsymbol \Psi$ and so $\mathbf C_\mathrm{PCA}=\mathbf W\mathbf W^\top$. This makes $\mathbf C$ low-rank, with zero determinant, and non-invertible.
Consequently, in PCA likelihood cannot be computed and likelihood ratio is undefined (ultimately because PCA is not a probabilistic model).
What exactly
principal()
is doing to "compute" it, is impossible to understand from its manual page. Having looked into its source code I saw that all the goodness of fit computations are done by thefactor.stats()
function, which again has an unclear manual page. The source code offactor.stats()
reveals that after computing model covariance matrix $\mathbf C = \mathbf W\mathbf W^\top$, the algorithm makes its diagonal equal to the diagonal of $\mathbf S$ (making it full rank). This is the correct computation in factor analysis because there the diagonals of $\mathbf C$ and $\mathbf S$ always coincide (due to the matrix of uniquenesses), but makes no sense in PCA.The resulting goodness of fit computation is essentially a factor analysis computation, but with PCA loadings. This is weird. I am not sure how this could be justified. My advice would be to ignore it and to use some other approach instead.
(Having said that, I guess that a very low $p$-value still does indicate that the number of components is "not sufficient": after all, such "diagonal-adjusted" $\widetilde{\mathbf C}$ is closer to $\mathbf S$ than the original $\mathbf C$, so if $\widetilde{\mathbf C}$ does not provide a sufficient fit, then $\mathbf C$ does not either.)