Solved – Significant predictors become non-significant in multiple logistic regression

logisticmultiple regressionstatistical significance

When I analyze my variables in two separate (univariate) logistic regression models, I get the following:

Predictor 1:    B= 1.049,    SE=.352,    Exp(B)=2.85,    95% CI=(1.43, 5.69),    p=.003
   Constant:    B=-0.434,    SE=.217,    Exp(B)=0.65,                            p=.046

Predictor 2:    B= 1.379,    SE=.386,    Exp(B)=3.97,    95% CI=(1.86, 8.47),    p<.001
   Constant:    B=-0.447,    SE=.205,    Exp(B)=0.64,                            p=.029

but when I enter them into a single multiple logistic regression model, I get:

Predictor 1:    B= 0.556,    SE=.406,    Exp(B)=1.74,    95% CI=(0.79, 3.86),    p=.171
Predictor 2:    B= 1.094,    SE=.436,    Exp(B)=2.99,    95% CI=(1.27, 7.02),    p=.012
   Constant:    B=-0.574,    SE=.227,    Exp(B)=0.56,                            p=.012

Both predictors are dichotomous (categorical). I have checked for multicollinearity.

I am not sure if I have given enough info, but I cannot understand why predictor 1 has gone from being significant to non-significant and why the odds ratios are so different in the multiple regression model. Can anyone provide a basic explanation of what is going on?

Best Answer

There are several reasons (none of which are specifically related to logistic regression, but may occur in any regression).

  1. Loss of degrees of freedom: when trying to estimate more parameters from a given dataset, you're effectively asking more of it, which costs precision, hence leads to lower t-statistics, hence higher p-values.
  2. Correlation of Regressors: Your regressors may be related to each other, effectively measuring something similar. Say, your logit model is to explain labor market status (working/not working) as a function of experience and age. Individually, both variables are positively related to the status, as more experienced/older (ruling out very old employees for the sake of the argument) employees find it easier to find jobs than recent graduates. Now, obviously, the two variables are strongly related, as you need to be older to have more experience. Hence, the two variables basically "compete" for explaining the status, which may, especially in small samples, result in both variables "losing", as none of the effects may be strong enough and sufficiently precisely estimated when controlling for the other to get significant estimates. Essentially, you are asking: what is the positive effect of another year of experience when holding age constant? There may be very few to no employees in your dataset to answer that question, so the effect will be imprecisely estimated, leading to large p-values.

  3. Misspecified models: The underlying theory for t-statistics/p-values requires that you estimate a correctly specified model. Now, if you only regress on one predictor, chances are quite high that that univariate model suffers from omitted variable bias. Hence, all bets are off as to how p-values behave. Basically, you must be careful to trust them when your model is not correct.

Related Question