Solved – How to interpret PCA loadings

pca

While reading about PCA, I came across the following explanation:

Suppose we have a data set where each data point represents a single student's scores on a math test, a physics test, a reading comprehension test, and a vocabulary test.

We find the first two principal components, which capture 90% of the variability in the data, and interpret their loadings. We conclude
that the first principal component represents overall academic
ability, and the second represents a contrast between quantitative
ability and verbal ability.

The text states that PC1 and PC2 loadings are $(0.5, 0.5, 0.5, 0.5)$ for PC1 and $(0.5, 0.5, -0.5, -0.5)$ for PC2, and offers the following explanation:

[T]he first component is proportional to average score, and the second component measures the difference between the first pair of scores and the second pair of scores.

I am not able to understand what this explanation means.

Best Answer

Loadings (which should not be confused with eigenvectors) have the following properties:

Their sums of squares within each component are the eigenvalues (components' variances).
Loadings are coefficients in linear combination predicting a variable by the (standardized) components.

You extracted 2 first PCs out of 4. Matrix of loadings $\bf A$ and the eigenvalues:

A (loadings)
         PC1           PC2
X1   .5000000000   .5000000000 
X2   .5000000000   .5000000000 
X3   .5000000000  -.5000000000 
X4   .5000000000  -.5000000000
Eigenvalues:
    1.0000000000  1.0000000000

In this instance, both eigenvalues are equal. It is a rare case in real world, it says that PC1 and PC2 are of equal explanatory "strength".

Suppose you also computed component values, Nx2 matrix $\bf C$, and you z-standardized (mean=0, st. dev.=1) them within each column. Then (as point 2 above says), $\bf \hat {X}=CA'$. But, because you left only 2 PCs out of 4 (you lack 2 more columns in $\bf A$) the restored data values $\bf \hat {X}$ are not exact, - there is an error (if eigenvalues 3, 4 are not zero).

OK. What are the coefficients to predict components by variables? Clearly, if $\bf A$ were full 4x4, these would be $\bf B=(A^{-1})'$. With non-square loading matrix, we may compute them as $\bf B= A \cdot diag(eigenvalues)^{-1}=(A^+)'$, where diag(eigenvalues) is the square diagonal matrix with the eigenvalues on its diagonal, and + superscript denotes pseudoinverse. In your case:

diag(eigenvalues):
1 0
0 1

B (coefficients to predict components by original variables):
    PC1           PC2
X1 .5000000000   .5000000000 
X2 .5000000000   .5000000000 
X3 .5000000000  -.5000000000 
X4 .5000000000  -.5000000000

So, if $\bf X$ is Nx4 matrix of original centered variables (or standardized variables, if you are doing PCA based on correlations rather than covariances), then $\bf C=XB$; $\bf C$ are standardized principal component scores. Which in your example is:

PC1 = 0.5*X1 + 0.5*X2 + 0.5*X3 + 0.5*X4 ~ (X1+X2+X3+X4)/4

"the first component is proportional to the average score"

PC2 = 0.5*X1 + 0.5*X2 - 0.5*X3 - 0.5*X4 = (0.5*X1 + 0.5*X2) - (0.5*X3 + 0.5*X4)

"the second component measures the difference between the first pair of scores and the second pair of scores"

In this example it appeared that $\bf B=A$, but in general case they are different.

Note: The above formula for the coefficients to compute component scores, $\bf B= A \cdot diag(eigenvalues)^{-1}$, is equivalent to $\bf B=R^{-1}A$, with $\bf R$ being the covariance (or correlation) matrix of variables. The latter formula comes directly from linear regression theory. The two formulas are equivalent within PCA context only. In factor analysis, they are not and to compute factor scores (which are always approximate in FA) one should rely on the second formula.

Related Solutions

Solved – What are the four axes on PCA biplot

Do you mean, e.g., in the plot that the following command returns?

biplot(prcomp(USArrests, scale = TRUE))

biplot USA arrests

If yes, then the top and the right axes are meant to be used for interpreting the red arrows (points depicting the variables) in the plot.

If you know how the principal component analysis works, and you can read R code, the code below shows you how the results from prcomp() are initially treated by biplot.prcomp() before the final plotting by biplot.default(). These two functions are called in the background when you plot with biplot(), and the following modified code excerpt is from biplot.prcomp().

x<-prcomp(USArrests, scale=TRUE)
choices = 1L:2L
scale = 1
pc.biplot = FALSE
scores<-x$x
lam <- x$sdev[choices]
n <- NROW(scores)
lam <- lam * sqrt(n)
lam <- lam^scale
yy<-t(t(x$rotation[, choices]) * lam)
xx<-t(t(scores[, choices])/lam)
biplot(xx,yy)

Shortly, in the example above, the the matrix of variable loadings (x$rotation) is scaled by the standard deviation of the principal components (x$sdev) times square root of the number of observations. This sets the scale for the top and right axes to what is seen on the plot.

There are other methods to scale the variable loadings, also. These are offered e.g. by the R package vegan.

Solved – How to interpret the loadings of the second principal component

First of all, you should do a scatter plot of the projection of your individuals on the first two PCs. If instead of seeing a single ellipsis, they cluster in different groups, you'll find an easy interpretation of your data.

If they fall in a single elipsis, you can interpret the PCA as giving you low dimensional (approximate) models of your data.

If you decide to keep only the first PC, you consider that the individuals are roughly distributed along one axis (the long axis of the mentioned ellipsis), given by this PC. In your case, you can interpret this axis as a "good performer/bad performer" axis. As all your loadings have similar values, this means that you consider that a typical individual will have similar scores at all five tests, and the coordinate of an individual on this axis is approximately proportional to its mean score.

If you decide to keep the two first PCs, you consider that they are distributed in a plane. The first axis is as previously; the second axis, orthogonal to the first one, materializes the differences between people with the same coordinate on the first axis. In you case, this means that among people with similar scores, some are more of the muscular type, the others are more intellectuals.

The decision to keep one, two, or more PCs, to give a good description of your data, should rely in particular on the eigenvalues associated to the PCs (or on the proportion of explained variance...).

Best Answer

Related Solutions

Solved – What are the four axes on PCA biplot

Solved – How to interpret the loadings of the *second* principal component

Solved – How to interpret the loadings of the second principal component