Solved – Variable selection for multiple linear regression

model selectionmultiple regression

Using all possible subsets we consider the adjusted $R^2$, Akaike's Information Criterion (AIC), corrected AIC ($AIC_c$), and Bayesian Information Criterion. The model with the highest adjusted $R^2$), and lowest AIC, AICc and BIC is usually the best model.

When doing stepwise subsets we use backward elimination and forward selection. Based on the criterion we choose we either add predictor variables to reduce the criterion (forward selection) or subtract predictor variables to reduce the criterion (backward elimination).

My question is why do we sometimes encounter different models with the two approaches? Is it because in stepwise subsets we only focus on minimizing one criterion? Also does this mean using all possible subsets provides a better model?

Best Answer

(1) It's not about the criterion: backward elimination / forward selection are greedy algorithms which don't search the whole set of models. So e.g. forward selection will stop when no predictor can be added that improves the criterion, but won't check if removing a predictor that came in earlier before adding another improves it.

(2) All possible subsets will find the "best" model according to any criterion you set it, but using that model to make predictions on new data often reveals a big drop in performance. The wider your search for an best fitting model, the more you capitalize on chance fluctuations in whichever criterion & the more optimistic your assessment of that model's performance. (So stepwise methods can sometimes work better just because they restrict the search space. ) See here for an excellent exposition of the problem.