Solved – How to interpret a negative coefficient in logistic regression

logisticregressionregression coefficients

This is the summary of a fitted model on Titanic dataset in r

             Estimate Std. Error z value Pr(>|z|)     
(Intercept)  7.668775   0.641018  11.963  < 2e-16 
Pclass      -1.098189   0.137969  -7.960 1.72e-15 
Sex:male    -2.726408   0.194561 -14.013  < 2e-16 
Age         -0.039385   0.007773  -5.067 4.05e-07 
Sibling     -0.378646   0.106212  -3.565 0.000364 

I want to interpret the coefficients for sibling and sex. I'm confused on the statements below, which is correct and which is incorrect:

For sibling:

  • Keeping all other predictors constant then, the odd ratio of survival for having an additional sibling is $e^{-0.38}=0.68$

  • Keeping all other predictors constant then, the log odd ratio of survival for having an additional sibling decreases by 0.38 units (what does it mean?)

  • Keeping all other predictors constant then, the odd ratio of survival for having an additional sibling decreases/increases by 0.68 units

  • Keeping all other predictors constant then, the odd ratio of survival for having an additional sibling is 0.68 times lower (less likely)

  • Keeping all other predictors constant then, the probability of survival for having an additional sibling is $sigmoid(-0.38)$ lower

  • When the other predictors are held constant, the odds ratio of survival between the given level (Male) and the reference level (Female) is -2.73 lower.

  • ….

The coefficient is negative and the odd ratio is positive but below one, however I can't relate them to the response variable and how it affects the response variable.

Best Answer

The following bullet points are correct, and equivalent:

Keeping all other predictors constant then, the log odd ratio of survival for having an additional sibling decreases by 0.38 units (what does it mean?)

Keeping all other predictors constant then, the odd ratio of survival for having an additional sibling is 0.68 times lower (less likely)

To see that they're equivalent, let $r_0$ be an odds ratio, and $r_1$ be the odds ratio with an additional sibling to $r_0$ and all else fixed. Then,

$$ \begin{split} \log r_1 &= \log r_0 - 0.38 \\ r_1 &= \exp\left\{ \log r_0 - 0.38 \right\} \\ &= r_0 \exp(-0.38) \\ &\approx 0.68 r_0. \end{split} $$

As you'll notice, I crossed out "less likely" in your quote above. Reducing an odds ratio by a factor of $x$ is not the same as reducing a probability by a factor of $x$ because an odds ratio is not the same thing as a probability. The odds ratio of survival means, by definition, $p_\text{survival} / p_\text{death}.$

The reason why reducing an odds ratio by a fixed factor can be confusing, is because this does not correspond to reducing a probability by a fixed factor. The factor by which the probability is reduced depends on the original odds ratio.

For example, suppose we reduce an odds ratio by a factor of half. If the odds ratio is $1$ and we reduce it to $1/2$, this corresponds to reducing the probability from $1/2$ to $1/3$, which is a reduction by a factor of $2/3.$ However, if we again reduce the odds ratio by half, from $1/2$ to $1/4$, this corresponds to reducing the probability from $1/3$ to $1/5,$ which is a reduction by a factor of $3/5$, a less severe reduction than before.

Dealing with odds ratios instead of probabilities can be pretty tricky, because we think more intuitively in terms of the probabilities than odds ratios. However, linear models work well with odds ratios because log-odds ratios can fit anywhere on the real line, while probabilities are confined to $[0,1].$

Related Question