Solved – Highly correlated predictors in backward stepwise regression

multicollinearityregressionstepwise regression

I know that it's not right to enter variables having multicollinearity (high correlation) into a regression analysis. But if I'm using backward stepwise regression could I add all the highly correlated predictors and expect the best predictors to remain in the model at the end or the analysis could go awry? My experience with my own data resulted in removal of the redundant (with multicollinearity) variables thus it seems there's no problem, but I want to report my method and I want to know whether I did it right or it resulted just by chance.
So to say briefly:

Is it right to enter variables with multicollinearity into a backward stepwise regression analysis?

Best Answer

Either of a pair of highly correlated predictors may be regarded as "redundant". So in addition to the other problems with stepwise variable selection methods, it tends to be a toss-up which one gets into the final model—use bootstrap validation & see how (in)consistently the same predictors are selected in different bootstrap samples. Avoid the temptation, therefore, to think that the variables eliminated by stepwise selection are irrelevant in any broader sense than that you got away without using them to fit a model to a particular sample. Better ways of dealing with multicollinearity are data reduction prior to modelling or ridge regression: see here.