Solved – PCA finds a variable to be the most important twice

interpretationpca

Suppose that I have a data set of three variables, Calcium, Iron, and Uranium.

Suppose also that I run PCA and obtain the following principal components:

$$\begin{array}{cccc}&PC_1&PC_2&PC_3\\Calcium&0.6729&0.1021&-0.6771\\Iron&0.5331&0.2554&0.5402\\Uranium&0.1123&-0.8007&-0.0432\end{array}$$

The first PC shows Calcium as having the largest importance and Iron as being the second highest correlation. The second PC shows Uranium as having the largest correlation. But, the third PC then again denotes Calcium as having the largest correlation with the response, then Iron second.

My main question is how such a PCA outcome can be interpreted. It makes no sense to say that Calcium is the most explanatory of the variance, as well as also being the third most explanatory variable for the variance.

Best Answer

Your interpretation of PCA components is not correct.

PCA does not tell you which variables account for the most variation in the data, so a statement like

Calcium is the most explanatory of the variance, as well as also being the third most explanatory variable for the variance.

cannot be drawn from a PC analysis.

What it does say is that the direction determined by the vector

$$\begin{array}{cccc}&PC_1\\Calcium&0.6729\\Iron&0.5331\\Uranium&0.1123\end{array}$$

accounts for the most variation in the data. This direction is a combination of the directions determined by the individual variables. This mixing of directions is fundamental to PCA, and it cannot be undone or ignored.

The further principal components are interpreted iteravely, they account for the most variation in the data in directions that are orthogonal to the previous PC directions.