Solved – Stepwise regression vs. elastic net

elastic netlassor-squaredstepwise regression

I understand that Stepwise regression analysis has lots of limitations, including the assumption that the predictors are not highly correlated with each other. In fact, this limitation was the most important reason that I switched to Elastic Net, as I had 75 predictors in my model, some of which are highly correlated.

Using Elastic Net, I could reduce my predictors to 21. I used these selected 21 variables in a multilinear regression model and calculated the coefficient of determination ($R^2=0.58$).

However, when I used Stepwise analysis on the same data, only 11 variables got selected, while the R-square stayed the same! Does it mean that my results from Stepwise analysis can explain a higher proportion of my outcome? If so, how can I justify the limitations of Stepwise analysis over Elastic Net when I'm getting better results?

Best Answer

Your question has an implicit assumption that $R^2$ is a good measure of the quality of the fit and is appropriate for comparing between models. I think that your background information provides evidence that $R^2$ is not a good tool for what you are trying to do. After all, you can increase $R^2$ by adding nonsense variables to your model.

Did you take the variables that were found using the elastic net and refit a new regression model using those variables rather than use the estimates from the elasticnet fit? That is kind of like entering your data into a nice statistical software program and using it to round the data and print it out so you can calculate the mean using an abacus.

If you want the fewest predictors possible (and still get a reasonable fit) then lasso methods will tend to result in fewer predictors than elasticnet methods. The advantage of the elasticnet method is not in finding the fewest variables, but in finding a good model that takes advantage of the information in the variables and avoids the bias that you get with stepwise models.

A better comparison would be how well they predict a new set of observations, or maybe a press statistic or cross-validation.