Solved – What to do in PCA when one variable has similar values in several principal component eigenvectors

pca

I'm performing a principal component analysis (PCA) using some economic variables of a region. I have six variables and I want to reduce them to two principal components. Most of the variables have a larger value in either one of the first two principal components' eigenvectors. For example, they either have a large value in the first principal component [eigenvector] and near zero in the second, or vice versa.

However, I have one variable that has almost the same value in the eigenvectors of the first and the second principal components.

Does this tell me anything about this variable? Should I keep it or should I remove it from the analysis?

Best Answer

Eigenvectors are just giving you the "directions" of the principal component axes; typically, those are unit vectors. In PCA, you order the eigenvectors by decreasing eigenvalues; the eigenvalues tell you about how much "variance is explained" by the eigenvectors (you principal component axes). E.g., if you use PCA for dimensionality reduction on a linear task, you'd want to choose the top k eigenvectors that explain most of the variance (contain the most information).

As mentioned above, you can calculate the "variance explained" based on the magnitude of the eigenvalues; I plotted the "variance explained" for the Iris dataset below:

enter image description here

In this plot, you can see that the first two principal components (the eigenvectors that correspond to the 2 largest eigenvalues) explain almost all of the variance in this dataset (>95 %).

I have a short tutorial and code examples here if you want to reproduce the results.