Solved – Stepwise regression in R with both direction

rregressionstepwise regression

How does the stepwise regression method work for both direction in R with the step() function.

I would think that one variable will be placed into the model and then another that will improve the measuring criteria and the significance of the older variable gets assessed. If the older variable's coefficient is not significant the variable will be removed and a next variable will be placed into the model and so forth.

I am not a 100% sure if this is how the step function with both do it, but can someone please inform me if this is correct, if not how does the both direction criteria implement the stepwise regression in R with step().

Best Answer

The underlying procedure is beautifully documented in Chambers & Hastie (eds, 1992; Ch. 6) (contrary to what the help page says) on page 237.

stats::step() with the option direction = 'both' works by comparing the AIC improvements from dropping each candidate variable, and adding each candidate variable between the upper and lower bound regressor sets supplied, from the current model, and by dropping or adding the one variable that leads to the best AIC improvement (smallest AIC).

For example, assume that you are fitting a linear regression model with the upper set of variables $\mathcal{U} = \{X_1, X_2, X_3, X_4, X_5, X_6, X_7\}$, and lower set $\mathcal{L} = \{X_1\}$, and the starting object $\mathcal{S}_0 = \{X_1, X_3\}$, then the potential sets of retained regressors might be something like $$ \begin{align} \mathcal{S}_1 &= \{X_1, X_3, X_6\} &\text{ (add $X_6$) }\\ \mathcal{S}_2 &= \{X_1, X_3, X_6, X_4\} &\text{ (add $X_4$) }\\ \mathcal{S}_3 &= \{X_3, X_6, X_4\} &\text{ (drop $X_1$) }\\ \mathcal{S}_4 &= \{X_3, X_6, X_4, X_7\} &\text{ (add $X_7$) }\\ \mathcal{S}_5 &= \{X_3, X_4, X_7\} &\text{ (drop $X_6$) } \end{align} $$ and so on, till no AIC improvements can be made.