Solved – Stepwise Model Selection in Logistic Regression in R

aiclogisticrstepwise regression

I'm implementing a logistic regression model in R and I have 80 variables to chose from. I need to automatize the process of variable selection of the model so I'm using the step function.

I've no problem using the function or finding the model, but when I look at the final model I find that some of the variables chosen by the step function are not significant (I look at this using the summary function and looking at the fourth column in $coef, this is the Wald Test). This is a problem because I need all the variables included in the model to be significant.

Is there any function or any way to get the best model based on AIC or BIC methods but that also consider that all the coefficients must be significant?
Thanks

Best Answer

Using stepwise selection to find a model is a very bad thing to do. Your hypothesis tests will be invalid, and your out of sample predictive accuracy will be very poor due to overfitting. To understand these points more fully, it may help you to read my answer here: Algorithms for automatic model selection.

The stepAIC function is selecting a model based on the AIC, not whether individual coefficients are above or below some threshold as SPSS does. However, the AIC can be understood as using a specific alpha, just not .05. Instead, it's approximately .157. For more on that, see @Glen_b's answers here: Stepwise regression in R – Critical p-value.