Solved – Logistic Regression- Control Variable Not significant In Model 2

controlling-for-a-variablelogisticregression

I am new to this site and it's my first post, so my apologies in advance if I have made any mistakes or did not follow proper etiquette.

I am running a logistic regression analysis with six predictor variables and I have controlled for two variables (a total of 8 variables in my final model). In Model 1 with only the control variables included, both variables are significant below .05. However in Model 2 (the full model) with all of the predictor variables and both controls, one of the original control variables that was significant, dropped in its significance.

I am so confused as to how this could happen. Any ideas? Also, how would I go about reporting my results? Am I still able to report this one variable that dropped from significance as being significant in Model, even though it was not significant in the final Model 2?

Thank you in advance for anyone who can answer my question

Best Answer

Welcome!

With multiple logistic regression (i.e. logistic regression with more than one predictor) or linear regression for that matter, including multiple predictors slightly changes the meaning of your coefficients. If we had a model that looked like this:

$$ dv = \beta_0 + \beta_1X_1 +\epsilon$$

The slope, $\beta_1$ associated with the predictor $X_1$ would tell us about the total effect of $X_1$ on the outcome, dv. If you add a second predictor, this changes things a bit:

$$ dv = \beta_0 + \beta_1X_1 + \beta_2X_2 + \epsilon $$

When we include multiple predictors in logistic regression, each slope (and test of the slope) tells us about the unique effect that predictor has on the outcome. Any shared explanatory power that may exist between $X_1$ and $X_2$ is removed from the slopes for $X_1$ and $X_2$ and the slopes only represent the unique explanatory power of each predictor. So, to the extent that $X_1$ and $X_2$ are related and shared explanatory power with regard to dv, the slope for $X_1$ will become smaller in the second model. When we run regression with multiple predictors we need to mention that we controlled for other predictors and our reported effect is the effect of the predictor that is unique to that predictor and not others in the model. People use a lot of different terms to describe this but you could refer to the "unique effect of $X_1$ over and above $X_2$",the effect of $X_1$ controlling for $X_2$, the effect of $X_1$ partialling out $X_2$, etc.

If $\beta_1$ does get smaller in the second model, you can check to see which other predictors are related to this one by running a model that predicts $X_1$ using the other predictors in your model of interest ($X_2$ in our case). This issue of related predictors is often referred to as redundancy or multi-collinearity. Some people see this as a bad thing because it means one or more of your predictors is non-significant in the multiple logistic regression model, but I tend to disagree. When you find predictors that are redundant that tell you something about your model. If I found that weight was a significant predictor of whether someone got diabetes, but found that relationship was attenuated after including calories consumed consumed per day, that would tell me that weight doesn't tell me anything special about diabetes any more than how much someone tends to eat.

In terms of writing about your results, it is totally fine to report both analyses because finding that some of your predictors are no longer significant in the multiple regression framework should tell you something about your overall theoretical model. However, you will want to make it clear that you can explain what each model tells you. Hope this helps!