Solved – Which variables explain which PCA components, and vice versa

dimensionality reductionpcarregression-strategies

Using this data:

head(USArrests)
nrow(USArrests)

I can do a PCA as thus:

plot(USArrests)
otherPCA <- princomp(USArrests)

I can get the new components in

otherPCA$scores

and the proportion of variance explained by components with

summary(otherPCA)

But what if I want to know which variables are mostly explained by which principal components? And vice versa: is e.g. PC1 or PC2 mostly explained by murder? How can I do this?

Can I say for instance that PC1 is 80% explained by murder or assault?

I think the loadings help me here, but they show the directionality not the variance explained as i understand it, e.g.

otherPCA$loadings

Loadings:
         Comp.1 Comp.2 Comp.3 Comp.4
Murder                         0.995
Assault  -0.995                     
UrbanPop        -0.977 -0.201       
Rape            -0.201  0.974   

Best Answer

You are right, the loadings can help you here. They can be used to compute the correlation between the variables and the principal components. Moreover, the sum of the squared loadings of one variable over all principal components is equal to 1. Hence, the squared loadings tell you the proportion of variance of one variable explained by one principal component.

The problem with princomp is, it only shows the "very high" loadings. But since the loadings are just the eigenvectors of the covariance matrix, one can get all loadings using the eigen command in R:

 loadings <- eigen(cov(USArrests))$vectors
 explvar <- loadings^2

Now, you have the desired information in the matrix explvar.

Related Question