Solved – Highly correlated predictors in backward stepwise regression

multicollinearityregressionstepwise regression

I know that it's not right to enter variables having multicollinearity (high correlation) into a regression analysis. But if I'm using backward stepwise regression could I add all the highly correlated predictors and expect the best predictors to remain in the model at the end or the analysis could go awry? My experience with my own data resulted in removal of the redundant (with multicollinearity) variables thus it seems there's no problem, but I want to report my method and I want to know whether I did it right or it resulted just by chance.
So to say briefly:

Is it right to enter variables with multicollinearity into a backward stepwise regression analysis?

Best Answer

Either of a pair of highly correlated predictors may be regarded as "redundant". So in addition to the other problems with stepwise variable selection methods, it tends to be a toss-up which one gets into the final model—use bootstrap validation & see how (in)consistently the same predictors are selected in different bootstrap samples. Avoid the temptation, therefore, to think that the variables eliminated by stepwise selection are irrelevant in any broader sense than that you got away without using them to fit a model to a particular sample. Better ways of dealing with multicollinearity are data reduction prior to modelling or ridge regression: see here.

Related Solutions

Solved – Stepwise regression vs. elastic net

Your question has an implicit assumption that $R^2$ is a good measure of the quality of the fit and is appropriate for comparing between models. I think that your background information provides evidence that $R^2$ is not a good tool for what you are trying to do. After all, you can increase $R^2$ by adding nonsense variables to your model.

Did you take the variables that were found using the elastic net and refit a new regression model using those variables rather than use the estimates from the elasticnet fit? That is kind of like entering your data into a nice statistical software program and using it to round the data and print it out so you can calculate the mean using an abacus.

If you want the fewest predictors possible (and still get a reasonable fit) then lasso methods will tend to result in fewer predictors than elasticnet methods. The advantage of the elasticnet method is not in finding the fewest variables, but in finding a good model that takes advantage of the information in the variables and avoids the bias that you get with stepwise models.

A better comparison would be how well they predict a new set of observations, or maybe a press statistic or cross-validation.

Solved – Stepwise introduction of predictors to mixed-effects models

You can also perform a likelihood-ratio test (LRT) with mixed models. The command is also anova(model1, model2). But there are a few things you have to consider. To use the anova command after lme that has been fitted by REML (Restricted maximum likelihood; the default), the models have to include the same fixed effects and both have to be fitted via REML. If you want to calculate a likelihood-ratio test for models with different fixed effects, you have to fit the models via maximum likelihood (ML). You can do that by specifying the option method="ML" within the lme command. After that, you can just type anova(lme.mod1, lme.mod2) to calculate the LR test. If the LR test is significant, you have evidence that the model including probCategorySame is an improvement over the model without that variable.

As an alternative, you can also simulate the likelihood-ratio test with the following syntax (lm1 is the simpler model and lm2 is the alternative model):

sim.lme <- simulate(lm1, nsim=1000, m2=lm2, method="ML")
plot(sim.lme)

This produces a graphic like that

Simulation of p-value for an lme-model

The key is that the simulated empirical $p$-values should be roughly the same as the nominal $p$-values (the blue line should be diagonal). If that is the case, you can use the nomial $p$-value from the likelihood-ratio test.

In their book Mixed-Effects Models in S and S-Plus (page 87-92), Pinheiro and Bates discourage the use of likelihood-ratio test for assessing the significance of fixed effects. They recommend the conditional $F$-test instead, which you can use with the anova command with a single model. In your case, that would be anova(lm2).

Best Answer

Related Solutions

Solved – Stepwise regression vs. elastic net

Solved – Stepwise introduction of predictors to mixed-effects models

Related Question