PCA – How to Interpret the Results of Principal Component Analysis?

pca

I have three variables: LAPOP05_10, LAPOP1_10, and LAPOP1_20. I have run a principal component analysis (PCA), and have obtained the following results:

Importance of components:
                         PC1     PC2     PC3
Standard deviation     1.649 0.51274 0.13824
Proportion of Variance 0.906 0.08763 0.00637
Cumulative Proportion  0.906 0.99363 1.00000

PC1 will explain about 91% of the variance in the data, and PCA2 will explain about 9%. Here is my loading plot:

I am new to PCA, and am not understanding how to interpret these results. Is this saying that the LAPOP05_10 explains the majority of the PC1 data? Any help would be much appreciated!

enter image description here

Best Answer

The use of PCA here is quite possibly one question hiding your real question. Why are you using PCA? If it's because you are trying to deal with redundancy among variables, that is a fair if fuzzy problem, but PCA is not the only game in town.

For a good answer we need to know that and much more about your data by way of context.

LAPOP05_10, LAPOP1_10, and LAPOP1_20 are not explained and we can only guess what they are.

But my guesses include wondering whether there are inbuilt relationships, say that 5 to 10, 1 to 10, and 1 to 20 are overlapping intervals. If so, there are likely to be inequalities such as say the number of people aged 5 to 10 being necessarily included in the number of people aged 1 to 10 and in the number of people aged 1 to 20. People and age is just an example suggested by wondering whether POP means population.

The scatter plot of PCA results (scores, I think, not loadings) shows sharp bounds whereby data lie within a polygon, which is also suggestive of inequalities, including bounds such as values all being positive.

I would encourage:

  1. Thinking more about the definitions.

  2. A scatter plot matrix for the original variables, which is often much easier to interpret than PCA output.

  3. Plots of the PCs against the original variables, which can be helpful too.

PCA divides opinion, but those who think it is often little or no use, or frequently the wrong choice of method, tend to write much less about it than those who think and find it useful (and naturally show examples where it works well). It works best when there are bundles of variables each strongly correlated with each other, with ideally a relatively simple interpretation for the first few PCs.

Related Question