Solved – Logistic Regression Confidence Interval interpretation

confidence intervallogisticr

(Note: This question helps to inform the current one)

I would like to identify variables that are significant at the 95% level in a logistic regression but have very little to no impact on the response. I've read the CV questions on interpreting regression output. And have also read the Stanford and UCLA links on interpretation. I used the combination of knowledge I've gained to create a table to determine which predictors are either not significant or have little to no effect on the response. But I am not sure I am coming to the correct conclusions, especially how confidence intervals play a role in log odds ratios:

library(broom)  #for tidy model output
mdl1 <- glm(am ~ mpg + disp, mtcars, family=binomial)
out <- tidy(mdl1)
out[-1] <- round(out[-1], 4)
out$significant <- out$p.value < 0.05
cbind(out[-(1:2)], round(exp(cbind(OR = coef(mdl1), confint(mdl1))), 4))
# Waiting for profiling to be done...
#             std.error statistic p.value significant     OR  2.5 %    97.5 %
# (Intercept)    4.7601   -0.4741  0.6354       FALSE 0.1047 0.0000 1192.7499
# mpg            0.1684    1.0095  0.3127       FALSE 1.1853 0.8727    1.7252
# disp           0.0078   -0.9749  0.3296       FALSE 0.9924 0.9749    1.0064

This appears to be a good start. I know the log odds, p values, and confidence intervals for each variable. This case would be easy since none of the predictors are significant. But let's ignore that for the moment. If I they were significant and I wanted to see how the confidence intervals can help determine the effect of the predictors, can I use confidence intervals that include 1.000?

I ask because this Brandon Foltz tutorial says to remove such variables (around the five minute mark). So these variables would be removed because they satisfy the condition that with 95% confidence the true coefficient includes 1.00 which would indicate a non-effect on the response.

Is this two-step process a good way of using logistic regression output to understand the effect of the predictors?

Best Answer

If your goal is to find the set of variables that most associate with outcome, you are in the world of Feature Selection. For that purpose you can do stepwise regression (function "step" in R) - which essentially is the automated process of doing what you say in your text.

More popular with statisticians is to use shrinkage methods - in particular the L1 norm penalty (LASSO regression).

Both these methods will result in removing predictors that have no effect on your outcome. More accurately they will result in removing predictors that have no extra effect - extra to the remaining predictors - on your outcome.

Both stepwise and LASSO regression methods are available in R.

Related Question