Solved – Comparison of multiple regression models

model selectionmultiple regressionregression

I have one dependent variable and many predictors, and I need to use a multiple regression model (linear).
Now, I performed a stepwise regression to determine which independent variables to include in the final model.
However, I see that when results are published, people usually show more than one model. For instance, they include just the first variable and then the first and the second, then the first, the second and the third, and so on. Showing that the adjusted coefficient of determination, $R^2$, improves.

Do I always have to do that, in order to show that each variable has a contribution (of course if coefficients are significant)?
Is there any reference to a formal procedure that better explains this?

Best Answer

If you've used a stepwise method (& see Algorithms for automatic model selection for the drawbacks), you can show the current model at each step (more usual for exposition of the method than because of any perceived intrinsic interest of each intermediate model, I'd have thought). Otherwise there's no point: as @charles says, it's common to compare models suggested by competing theories, or that differ in the expense of using them for prediction, or (in general) for reasons that depend on what the models say about the things they model.

It may be tempting to view the change in the coefficient of determination as you add each predictor as a measure of its importance for or contribution to the model's predictive power; but if the predictors are correlated, as they typically will be for observational data, this can be quite misleading—you get different answers by changing the order in which you add predictors. Jeromy Anglim's blog discusses the issues, & suggests better measures.