Solved – How to use principal components as predictors in regression

factor analysisfactor-rotationlogisticpcaregression

I have a couple of questions involving doing a regression (logistic or linear) after principal component analysis.

  1. If I find principal components using Principal component analysis, can I use these components like regular variables to be used in linear and logistic regression?

  2. If not, do I have to perform an extra step like "rotate" the principal components? What is this "rotation", what is it actually doing, and why is it helpful for using in regression? (this is what happens in factor analysis, no?)

  3. If I can use these components, rotated or not, can I simply setup the model and bind it with the outcome data? Do I need to transform the outcome data in any way? Or can I use the original outcome data with the principal components?

Best Answer

  1. Yes.

    That's exactly what principal component regression is: https://en.wikipedia.org/wiki/Principal_component_regression.

  2. No need to rotate.

    In fact, rotating would not make any difference as far as prediction is concerned; see Using varimax-rotated PCA components as predictors in linear regression. Of course if you want to interpret individual regression coefficients, then you'd need to have interpretable predictors; rotated PCs might be more interpretable (or not).

  3. No need to transform.

    The dependent ("outcome") variable should be left alone. Including it into PCA would be cheating. There is no transformation that you should use only because you use PCA.


In general, you might want to read How can top principal components retain the predictive power on a dependent variable (or even lead to better predictions)?.