Solved – Exponentiated logistic regression coefficient different than odds ratio

interpretationlogisticodds-ratioregression

As I understand it, the exponentiated beta value from a logistic regression is the odds ratio of that variable for the dependent variable of interest. However, the value does not match the manually calculated odds ratio. My model is predicting stunting (a measure of malnutrition) using, amongst other indicators, insurance.

// Odds ratio from LR, being done in stata
logit stunting insurance age ... etc. 
or_insurance = exp(beta_value_insurance)

// Odds ratio, manually calculated
odds_stunted_insured = num_stunted_ins/num_not_stunted_ins
odds_stunted_unins = num_stunted_unins/num_not_stunted_unins
odds_ratio = odds_stunted_ins/odds_stunted_unins

What is the conceptual reason for these values being different? Controlling for other factors in the regression? Just want to be able to explain the discrepancy.

Best Answer

If you're only putting that lone predictor into the model, then the odds ratio between the predictor and the response will be exactly equal to the exponentiated regression coefficient. I don't think a derivation of this result in present on the site, so I will take this opportunity to provide it.


Consider a binary outcome $Y$ and single binary predictor $X$:

$$ \begin{array}{c|cc} \phantom{} & Y = 1 & Y = 0 \\ \hline X=1 & p_{11} & p_{10} \\ X=0 & p_{01} & p_{00} \\ \end{array} $$

Then, one way to calculate the odds ratio between $X_i$ and $Y_i$ is

$$ {\rm OR} = \frac{ p_{11} p_{00} }{p_{01} p_{10}} $$

By definition of conditional probability, $p_{ij} = P(Y = i | X = j) \cdot P(X = j)$. In the ratio, he marginal probabilities involving the $X$ cancel out and you can rewrite the odds ratio in terms of the conditional probabilities of $Y|X$:

$${\rm OR} = \frac{ P(Y = 1| X = 1) }{P(Y = 0 | X = 1)} \cdot \frac{ P(Y = 0 | X = 0) }{ P(Y = 1 | X = 0)} $$

In logistic regression, you model these probabilities directly:

$$ \log \left( \frac{ P(Y_i = 1|X_i) }{ P(Y_i = 0|X_i) } \right) = \beta_0 + \beta_1 X_i $$

So we can calculate these conditional probabilities directly from the model. The first ratio in the expression for ${\rm OR}$ above is:

$$ \frac{ P(Y_i = 1| X_i = 1) }{P(Y_i = 0 | X_i = 1)} = \frac{ \left( \frac{1}{1 + e^{-(\beta_0+\beta_1)}} \right) } {\left( \frac{e^{-(\beta_0+\beta_1)}}{1 + e^{-(\beta_0+\beta_1)}}\right)} = \frac{1}{e^{-(\beta_0+\beta_1)}} = e^{(\beta_0+\beta_1)} $$

and the second is:

$$ \frac{ P(Y_i = 0| X_i = 0) }{P(Y_i = 1 | X_i = 0)} = \frac{ \left( \frac{e^{-\beta_0}}{1 + e^{-\beta_0}} \right) } { \left( \frac{1}{1 + e^{-\beta_0}} \right) } = e^{-\beta_0}$$

plugging this back into the formula, we have ${\rm OR} = e^{(\beta_0+\beta_1)} \cdot e^{-\beta_0} = e^{\beta_1}$, which is the result.

Note: When you have other predictors, call them $Z_1, ..., Z_p$, in the model, the exponentiated regression coefficient (using a similar derivation) is actually

$$ \frac{ P(Y = 1| X = 1, Z_1, ..., Z_p) }{P(Y = 0 | X = 1, Z_1, ..., Z_p)} \cdot \frac{ P(Y = 0 | X = 0, Z_1, ..., Z_p) }{ P(Y = 1 | X = 0, Z_1, ..., Z_p)} $$

so it is the odds ratio conditional on the values of the other predictors in the model and, in general, in not equal to

$$ \frac{ P(Y = 1| X = 1) }{P(Y = 0 | X = 1)} \cdot \frac{ P(Y = 0 | X = 0) }{ P(Y = 1 | X = 0)}$$

So, it is no surprise that you're observing a discrepancy between the exponentiated coefficient and the observed odds ratio.

Note 2: I derived a relationship between the true $\beta$ and the true odds ratio but note that the same relationship holds for the sample quantities since the fitted logistic regression with a single binary predictor will exactly reproduce the entries of a two-by-two table. That is, the fitted means exactly match the sample means, as with any GLM. So, all of the logic used above applies with the true values replaced by sample quantities.

Related Question