Solved – Is each of the PCA or PLS components just one of the original variables

dimensionality reductionfeature selectionpartial least squarespcaregression

I am confused about what a component is in PCA and PLS.

Are the components just the original variables but not necessarily in the same order?

For example, in PCA, if I had 8 variables in my data, would PC1 correspond to one of the 8 variables? And in PLSR (PLS regression), if I were to use 4 components, does this mean that I will be using 4 out of 8 variables to build a model?

Best Answer

The possible confusion here, as @amoeba points out in a comment, is the difference between variable selection and dimensionality reduction.

Both PCA and PLS are intended to reduce the dimensionality of the problem. If you have measured 8 variables on each of your cases (and you have more than 8 cases) then the original dimension is 8. PCA and PLS help you choose a lower number of dimensions that will work well enough.

But these procedures do not work by selecting subsets from your original 8 variables. Rather, they construct linear combinations of the 8 variables to make new sets of 8 predictors, then decide how many of these new combinations need to be included in the final model. For either PCA or PLS, these new predictors are designed to be orthogonal (multi-dimensional equivalent of perpendicular) to each other. If there are correlations among predictors, all 8 of your original variables are thus likely to be included to some extent even if you end up with a final dimension of, say, 4. So you are not typically performing all-or-none selection among your original variables. You just get rid of some less-important combinations of them.

PCA simply examines the predictors themselves, finding first the combination that captures the most variance in the predictors, then the (orthogonal) combination that captures the next most, and so on. Several superb explanations of how this works are on this highly rated page.

PLS includes in this type of scheme the relations of predictors to the outcome variable. At each step it finds the combination of predictors, orthogonal to all prior combinations, that maximizes the product of the variance of the predictors times the square of the correlation to the outcome variable. (See ESLII, eq. 3.64, page 81). For the first step, this is a linear combination weighted by each variable's individual correlation to outcome (unlike standard multiple regression, where all variables are considered together). PLS also gives a set of orthogonal predictors, made of linear combinations of the original variables, although different from those provided by PCA.

In either procedure, a decision is made about how many of these new predictors to include, determining the final dimensionality. In either case, if you include all the new predictors, you just get back the original multiple regression.

Also, please note that the above assumes that the predictor variables were first standardized so that differences in scales of the variables do not matter.