Solved – Does PCA mean selecting most important features and ignoring the others

machine learningpca

Principal component analysis (PCA) is used to reduce the dimensions in our data set. While explaining PCA, they say that they are projecting the data to where there is huge variance; is that the same as selecting only the most important features and ignoring the others?

Best Answer

Fingers crossed I can help. PCA, at its core, doesn't select the most "important" features. Really what it is is a linear transformation of your data to a new coordinate system where the first component direction is the one which has the largest variance (same for second, third...). The dimensionality reduction comes from choosing to keep a subset of the components. So it's not that PCA "selects" the most important features but rather it finds linear combinations of existing variables and the user decides how many of those new combinations to keep.

If I can guess what you actually care about, I'd say that the main issue with PCA and it's ability to "select the most important features" is that the components which are chosen can be very complicated linear combinations of the variables in your dataset. Often, this leads to the issue of component interpretability. That is, because your components are these strange combinations of real variables, it becomes very difficult (if not impossible) to identify a real world quantity with your component. If your goal is to reduce the dimensionality of your dataset while still having the ability (hopefully) to say that your new, smaller set of variables is a collection of observable variables then I'd look into sparse PCA. Sparse PCA is a technique that tries to accomplish the same goal as PCA but with the added constraint that the final linear combinations of variables should include a small subset of the original variables.