I'm trying to make sense of a principal component analysis using R (either princomp or prcomp, I get similar results) with a correlation matrix analysis. In particular, I'm having trouble understanding the factor loadings output.
The data are in a data frame called ds. Here is the eigenvalue/vector analysis of the correlation matrix.
> eigen(cor(ds))
$values
[1] 3.9831253 0.5483985 0.4520555 0.4191041 0.3506337 0.2466829
$vectors
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -0.4292649 0.2561060 -0.298728176 0.2474691 0.3939538 0.66668007
[2,] -0.3938608 0.7037562 -0.066953299 0.1803110 -0.4130921 -0.37677712
[3,] -0.4060290 0.1537541 0.365977815 -0.7055394 0.4027306 -0.13259964
[4,] -0.4166136 -0.4038936 -0.004352731 0.4760061 0.3984587 -0.52719349
[5,] -0.3951931 -0.4041873 -0.621151418 -0.3939178 -0.3732825 -0.01071668
[6,] -0.4074327 -0.2983273 0.621683956 0.1634262 -0.4624437 0.34343296
Now, when I ask for a principal component analysis, I get the following initial output.
> pca.out <- princomp(ds, cor=TRUE)
> summary(pca.out)
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
Standard deviation 1.9957769 0.74053931 0.67235070 0.64738253 0.59214334 0.49667185
Proportion of Variance 0.6638542 0.09139974 0.07534258 0.06985069 0.05843896 0.04111382
Cumulative Proportion 0.6638542 0.75525396 0.83059653 0.90044722 0.95888618 1.00000000
which all makes sense, given the output of the eigen() function.
However, my understanding is that loadings are computed as the product of the eigenvector and the square root of the eigenvalue. When I ask for the loadings from the pca, I get
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
A -0.429 0.256 -0.299 0.247 -0.394 0.667
B -0.394 0.704 0.180 0.413 -0.377
C -0.406 0.154 0.366 -0.706 -0.403 -0.133
D -0.417 -0.404 0.476 -0.398 -0.527
E -0.395 -0.404 -0.621 -0.394 0.373
F -0.407 -0.298 0.622 0.163 0.462 0.343
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
SS loadings 1.000 1.000 1.000 1.000 1.000 1.000
Proportion Var 0.167 0.167 0.167 0.167 0.167 0.167
Cumulative Var 0.167 0.333 0.500 0.667 0.833 1.000
Not only do the cumulative and proportion variance not match the initial output, but the loadings simply match the eigenvectors, with no adjustment for eigenvalue. Furthermore, the claim that the first component captures 66% of the variance is impossible with these loading values, because every single variable in the data set (A-F) has a later component with a higher (absolute) loading.
Can someone please straighten out my confusion/error? For the record, I'm running R version 3.3.3
Best Answer
I depends on definition of loading you use. In
princomp
loadings are simply coefficients of principal components (recall that principal components are linear combinations of original variables) that are equal to eigenvectors entries. This has one inconvenience: since variance of each PC equals corresponding eigenvaule, loadings defined this way are not correlations between PC's and original variables. Correction by square root of eigenvalue is done to standardize the variance of PC scores to 1 and therefore to allow for correlation interpretation of loadings. These standardized loadings are sometimes called loadings as well. See for examplePCA
function fromFactoMineR
package. It never uses a word loadings, it uses word coordinates for standardized loadings.loadings
function doesn't give you cumulative and proportion variance. It just gives you sum of squares of each PC's loadings. And this, by definition, is 1. So, you'll always see this kind of output. It sounds ridicullus but works well when you applyloadings
function to Explanatory Factor Analysis. In PCA, second part ofloadings
output is simply useless.Actually it is possible, since loadings here are just eigenvectors not standardized loadings.