I'm doing logistic regression where X
has four factors (1-4) and Y
has two factors (0-1). I did:
model=glm(formula=y~x,family=binomial(link=logit),data=data) #x=='1' is reference level
summary(model)
I get:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.6740 0.3632 -4.608 4.06e-06 ***
2 -1.0916 0.5000 -2.183 0.029 *
3 -1.9369 1.0766 -1.799 0.072 .
4 -16.8921 1058.1118 -0.016 0.987
Then for my odds ratios, I get:
ORs 2.5 % 97.5 %
(Intercept) 0.19 0.09 3.600000e-01
2 0.34 0.12 9.100000e-01
3 0.14 0.01 8.200000e-01
4 0.00 NA 1.057042e+21
I'm most interested in variable 2 and I'm confused how I would interpret its odds ratio. I would think that an OR<1
would mean a lesser likelihood of outcome Y=1
, but seeing how the log-likelihood coefficient for the intercept/variable 1 is less than variable 2. I'm confused if the OR for var2 would still be considered 'lower' than var1.
Any help is really appreciated. Thank you!
Best Answer
I think you have a desired result in mind (var2 lowers the probability of $Y=1$ than var1) and are perplexed by the numbers. The intercept $\beta_0$ in logistic regression should be interpreted carefully since $e^{\beta_0}$ is not an odds ratio. Observe this: $$ \begin{align} \beta_0 &= \mathrm{logit}(P(Y=1\mid \text{var2}=0, \text{var3}=0))\\ e^{\beta_0} &= \frac{P(Y=1\mid \text{var2}=0, \text{var3}=0)}{1-P(Y=1\mid \text{var2}=0, \text{var3}=0)}. \end{align} $$ But the slopes are different: $$ \begin{align} \beta_1 &= \mathrm{logit}(P(Y=1\mid \text{var2}=1,\text{var3}=0)) - \mathrm{logit}(P(Y=1\mid \text{var2}=0,\text{var3}=0))\\ &= \mathrm{logit}(P(Y=1\mid \text{var2}=1,\text{var3}=0)) - \beta_0\\ e^{\beta_1} &= \frac{P(Y=1\mid \text{var2}=1,\text{var3}=0)/P(Y=0\mid \text{var2}=1,\text{var3}=0)}{P(Y=1\mid \text{var2}=0,\text{var3}=0)/P(Y=0\mid \text{var2}=0,\text{var3}=0)}. \end{align} $$ The interpretation of slopes is always relative to the baseline (i.e., the intercept).