Solved – factor loadings = eigenvectors in R output

pcar

I'm trying to make sense of a principal component analysis using R (either princomp or prcomp, I get similar results) with a correlation matrix analysis. In particular, I'm having trouble understanding the factor loadings output.

The data are in a data frame called ds. Here is the eigenvalue/vector analysis of the correlation matrix.

> eigen(cor(ds))
$values
[1] 3.9831253 0.5483985 0.4520555 0.4191041 0.3506337 0.2466829

$vectors
           [,1]       [,2]         [,3]       [,4]       [,5]        [,6]
[1,] -0.4292649  0.2561060 -0.298728176  0.2474691  0.3939538  0.66668007
[2,] -0.3938608  0.7037562 -0.066953299  0.1803110 -0.4130921 -0.37677712
[3,] -0.4060290  0.1537541  0.365977815 -0.7055394  0.4027306 -0.13259964
[4,] -0.4166136 -0.4038936 -0.004352731  0.4760061  0.3984587 -0.52719349
[5,] -0.3951931 -0.4041873 -0.621151418 -0.3939178 -0.3732825 -0.01071668
[6,] -0.4074327 -0.2983273  0.621683956  0.1634262 -0.4624437  0.34343296

Now, when I ask for a principal component analysis, I get the following initial output.

> pca.out <- princomp(ds, cor=TRUE)
> summary(pca.out)
Importance of components:
                          Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6
Standard deviation     1.9957769 0.74053931 0.67235070 0.64738253 0.59214334 0.49667185
Proportion of Variance 0.6638542 0.09139974 0.07534258 0.06985069 0.05843896 0.04111382
Cumulative Proportion  0.6638542 0.75525396 0.83059653 0.90044722 0.95888618 1.00000000

which all makes sense, given the output of the eigen() function.

However, my understanding is that loadings are computed as the product of the eigenvector and the square root of the eigenvalue. When I ask for the loadings from the pca, I get

Loadings:
  Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
A -0.429  0.256 -0.299  0.247 -0.394  0.667
B -0.394  0.704         0.180  0.413 -0.377
C -0.406  0.154  0.366 -0.706 -0.403 -0.133
D -0.417 -0.404         0.476 -0.398 -0.527
E -0.395 -0.404 -0.621 -0.394  0.373       
F -0.407 -0.298  0.622  0.163  0.462  0.343

               Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
SS loadings     1.000  1.000  1.000  1.000  1.000  1.000
Proportion Var  0.167  0.167  0.167  0.167  0.167  0.167
Cumulative Var  0.167  0.333  0.500  0.667  0.833  1.000

Not only do the cumulative and proportion variance not match the initial output, but the loadings simply match the eigenvectors, with no adjustment for eigenvalue. Furthermore, the claim that the first component captures 66% of the variance is impossible with these loading values, because every single variable in the data set (A-F) has a later component with a higher (absolute) loading.

Can someone please straighten out my confusion/error? For the record, I'm running R version 3.3.3

Best Answer

However, my understanding is that loadings are computed as the product of the eigenvector and the square root of the eigenvalue.

I depends on definition of loading you use. In princomp loadings are simply coefficients of principal components (recall that principal components are linear combinations of original variables) that are equal to eigenvectors entries. This has one inconvenience: since variance of each PC equals corresponding eigenvaule, loadings defined this way are not correlations between PC's and original variables. Correction by square root of eigenvalue is done to standardize the variance of PC scores to 1 and therefore to allow for correlation interpretation of loadings. These standardized loadings are sometimes called loadings as well. See for example PCA function from FactoMineR package. It never uses a word loadings, it uses word coordinates for standardized loadings.

Not only do the cumulative and proportion variance not match the initial output

loadings function doesn't give you cumulative and proportion variance. It just gives you sum of squares of each PC's loadings. And this, by definition, is 1. So, you'll always see this kind of output. It sounds ridicullus but works well when you apply loadings function to Explanatory Factor Analysis. In PCA, second part of loadings output is simply useless.

the claim that the first component captures 66% of the variance is impossible with these loading values, because every single variable in the data set (A-F) has a later component with a higher (absolute) loading

Actually it is possible, since loadings here are just eigenvectors not standardized loadings.

Related Question