Solved – When doing principal components regression, do I need to standardize independent variables and/or dependent variable

pcaregression

I want to run PCA on a set of variables and then regress my dependent variable on the PCA scores.

I have the following questions:

  1. Should I scale and center my variables?
  2. If yes, should I also standardize my dependent variable before running linear regression?
  3. What if I don't standardize my dependent variable?

Best Answer

  1. Yes. If you don't scale and center, your PCA will not capture covariances, but the directions of largest variation. Put differently: your PCA should really not depend on whether lengths are measured in meters or inches, or your temperatures in Celsius and Fahrenheit.
  2. PCA works on the independent data only, it doesn't care about the dependent variables. You will usually not transform the dependent variables. Or you may want to transform them independently of the PCA, e.g., to stabilize variances or take logs, depending on the underlying science.
  3. See 2. You will usually not transform the dependent variable just because you do a PCA on the independent variables. So the question of whether or not to transform the DV and whether or not Bad Things will happen if you transform (or don't transform) is orthogonal to the PCA.