Solved – Variable selection for multiple linear regression

model selectionmultiple regression

Using all possible subsets we consider the adjusted $R^2$, Akaike's Information Criterion (AIC), corrected AIC ($AIC_c$), and Bayesian Information Criterion. The model with the highest adjusted $R^2$), and lowest AIC, AICc and BIC is usually the best model.

When doing stepwise subsets we use backward elimination and forward selection. Based on the criterion we choose we either add predictor variables to reduce the criterion (forward selection) or subtract predictor variables to reduce the criterion (backward elimination).

My question is why do we sometimes encounter different models with the two approaches? Is it because in stepwise subsets we only focus on minimizing one criterion? Also does this mean using all possible subsets provides a better model?

Best Answer

(1) It's not about the criterion: backward elimination / forward selection are greedy algorithms which don't search the whole set of models. So e.g. forward selection will stop when no predictor can be added that improves the criterion, but won't check if removing a predictor that came in earlier before adding another improves it.

(2) All possible subsets will find the "best" model according to any criterion you set it, but using that model to make predictions on new data often reveals a big drop in performance. The wider your search for an best fitting model, the more you capitalize on chance fluctuations in whichever criterion & the more optimistic your assessment of that model's performance. (So stepwise methods can sometimes work better just because they restrict the search space. ) See here for an excellent exposition of the problem.

Related Solutions

Solved – Regression selection using all possible subsets selection and automatic selection techniques

For the second part, you must interpret the output as the steps towards your final model.

For example, in the forward case you begin with Start: AIC=377.95 cars$MidrangePrice ~ 1

              Df Sum of Sq    RSS    AIC
+ cars$Horsepower  1    4979.3 3054.9 300.66
+ cars$Wheelbase   1    3172.3 4862.0 338.76
+ cars$Length      1    2448.8 5585.4 350.14
+ cars$Width       1    1969.2 6065.0 356.89
+ cars$Uturn       1    1450.2 6584.0 363.63
+ cars$Luggage     1    1079.6 6954.7 368.12
<none>                         8034.2 377.95

Your current model is only considering the constant cars$MidrangePrice ~ 1.

Each row in the table indicates that in case you add that variable (for example, Horsepower), you will get the following results rearding Sq RSS(Residual Sum of Squares) and AIC (Akaike Information Criterion).

In the other cases you must read the results the same way.

Hope this helps :)

Solved – Linear model predictor selection. Which method to use

This is a rather broad question.

First, I do not think Ridge regression shrinks coefficient to 0. It does not create sparsity so if you want to do feature selection it will be useless. You should consider the lasso instead or the elasticnet (which is a mix of ridge and lasso since a penalty L1 and a L2 one are added to the minimisation problem).

If your goal is really to select variables, have a look at the stability selection from Meinshausen and Bulhmann. The concept is to bootstrap and do Lasso regression. It uses the fact that there is a homotopic solution (meaning each coefficient in Lasso regression has a piecewise continuous solution path). Starting with a very high penalty and decreasing it step by step you have each coefficient being not null one by one. Now if you do that several times you can have a probability of not being null for each coefficient (meaning variable selected or not) for each penalty value.

This would be a good method if you have a lot of variables because Lasso can be seen as a convex relaxation of subset selection. So it is usually faster.

Dimension reduction (PCA for example) may not be designed to get a better performance accuracy because it is often unsupervised. See http://metaoptimize.com/qa/questions/9338/how-to-use-pca-for-classification for a more precise discussion on that subject.

Best Answer

Related Solutions

Solved – Regression selection using all possible subsets selection and automatic selection techniques

Solved – Linear model predictor selection. Which method to use

Related Question