Logistic Regression – Interpretation of Predictions to Odds Ratios

interpretationlogisticodds-ratiopredictionregression

I'm somewhat new to using logistic regression, and a bit confused by a discrepancy between my interpretations of the following values which I thought would be the same:

exponentiated beta values
predicted probability of the outcome using beta values.

Here is a simplified version of the model I am using, where undernutrition and insurance are both binary, and wealth is continuous:

Under.Nutrition ~ insurance + wealth

My (actual) model returns an exponentiated beta value of .8 for insurance, which I would interpret as:

"The probability of being undernourished for an insured individual is .8 times the probability of being undernourished for an uninsured individual."

However, when I calculate the difference in probabilities for individuals by putting in values of 0 and 1 into the insurance variable and the mean value for wealth, the difference in undernutrition is only .04. That is calculated as follows:

Probability Undernourished = exp(β0 + β1*Insurance + β2*Wealth) /
                             (1+exp(β0 + β1*Insurance + β2*wealth))

I would really appreciate it if someone could explain why these values are different, and what a better interpretation (particularly for the second value) might be.

Further Clarification Edits
As I understand it, the probability of being under-nourished for an uninsured person (where B1 corresponds to insurance) is:

Prob(Unins) = exp(β0 + β1*0 + β2*Wealth) /
              (1+exp(β0 + β1*0+ β2*wealth))

While the Probability of being under-nourished for an insured person is:

Prob(Ins)= exp(β0 + β1*1 + β2*Wealth) /
           (1+exp(β0 + β1*1+ β2*wealth))

The odds of being undernourished for an uninsured person compared to an insured person is:

exp(B1)

Is there a way to translate between these values (mathematically)? I'm still a bit confused by this equation (where I should probably be a different value on the RHS):

Prob(Ins) - Prob(Unins) != exp(B)

In layman's terms, the question is why doesn't insuring an individual change their probability of being under-nourished as much as the odds ratio indicates it does? In my data, Prob(Ins) – Prob(Unins) = .04, where the exponentiated beta value is .8 (so why is the difference not .2?)

Best Answer

It seems self-evident to me that $$ \exp(\beta_0 + \beta_1x) \neq\frac{\exp(\beta_0 + \beta_1x)}{1+\exp(\beta_0 + \beta_1x)} $$ unless $\exp(\beta_0 + \beta_1x)=0$. So, I'm less clear about what the confusion might be. What I can say is that the left hand side (LHS) of the (not) equals sign is the odds of being undernourished, whereas the RHS is the probability of being undernourished. When examined on its own, $\exp(\beta_1)$, is the odds ratio, that is the multiplicative factor that allows you to move from the odds($x$) to the odds($x+1$).

Let me know if you need additional / different information.

Update:
I think this is mostly an issue of being unfamiliar with probabilities and odds, and how they relate to one another. None of that is very intuitive, you need to sit down and work with it for a while and learn to think in those terms; it doesn't come naturally to anyone.

The issue is that absolute numbers are very difficult to interpret on their own. Lets say I was telling you about a time when I had a coin and I wondered whether it was fair. So I flipped it some and got 6 heads. What does that mean? Is 6 a lot, a little, about right? It's awfully hard to say. To deal with this issue we want to give numbers some context. In a case like this there are two obvious choices for how to provide the needed context: I could give the total number of flips, or I could give the number of tails. In either case, you have adequate information to make sense of 6 heads, and you could compute the other value if the one I told you wasn't the one you preferred. Probability is the number of heads divided by the total number of events. The odds is the ratio of the number of heads to the number of non-heads (intuitively we want to say the number of tails, which works in this case, but not if there are more than 2 possibilities). With the odds, it is possible to give both numbers, e.g. 4 to 5. This means that in the long run something will happen 4 times for every 5 times it doesn't happen. When the odds are presented this way, they're called "Las Vegas odds". However in statistics, we typically divide through and say the odds are .8 instead (i.e., 4/5 = .8) for purposes of standardization. We can also convert between the odds and probabilities: $$ \text{probability}=\frac{\text{odds}}{1+\text{odds}} ~~~~~~~~~~~~~~~~ \text{odds}=\frac{\text{probability}}{1-\text{probability}} $$ (With these formulas it can be difficult to recognize that the odds is the LHS at top, and the probability is the RHS, but remember that it's the not equals sign in the middle.) An odds ratio is just the odds of something divided by the odds of something else; in the context of logistic regression, each $\exp(\beta)$ is the ratio of the odds for successive values of the associated covariate when all else is held equal.

What's important to recognize from all of these equations is that probabilities, odds, and odds ratios do not equate in any straightforward way; just because the probability goes up by .04 very much does not imply that the odds or odds ratio should be anything like .04! Moreover, probabilities range from $[0, 1]$, whereas ln odds (the output from the raw logistic regression equation) can range from $(-\infty, +\infty)$, and odds and odds ratios can range from $(0, +\infty)$. This last part is vital: Due to the bounded range of probabilities, probabilities are non-linear, but ln odds can be linear. That is, as (for example) wealth goes up by constant increments, the probability of undernourishment will increase by varying amounts, but the ln odds will increase by a constant amount and the odds will increase by a constant multiplicative factor. For any given set of values in your logistic regression model, there may be some point where $$ \exp(\beta_0 + \beta_1x)-\exp(\beta_0 + \beta_1x') =\frac{\exp(\beta_0 + \beta_1x)}{1+\exp(\beta_0 + \beta_1x)}-\frac{\exp(\beta_0 + \beta_1x')}{1+\exp(\beta_0 + \beta_1x')} $$ for some given $x$ and $x'$, but it will be unequal everywhere else.

(Although it was written in the context of a different question, my answer here contains a lot of information about logistic regression that may be helpful for you in understanding LR and related issues more fully.)

Related Solutions

Solved – Exponentiated logistic regression coefficient different than odds ratio

If you're only putting that lone predictor into the model, then the odds ratio between the predictor and the response will be exactly equal to the exponentiated regression coefficient. I don't think a derivation of this result in present on the site, so I will take this opportunity to provide it.

Consider a binary outcome $Y$ and single binary predictor $X$:

$$ \begin{array}{c|cc} \phantom{} & Y = 1 & Y = 0 \\ \hline X=1 & p_{11} & p_{10} \\ X=0 & p_{01} & p_{00} \\ \end{array} $$

Then, one way to calculate the odds ratio between $X_i$ and $Y_i$ is

$$ {\rm OR} = \frac{ p_{11} p_{00} }{p_{01} p_{10}} $$

By definition of conditional probability, $p_{ij} = P(Y = i | X = j) \cdot P(X = j)$. In the ratio, he marginal probabilities involving the $X$ cancel out and you can rewrite the odds ratio in terms of the conditional probabilities of $Y|X$:

$${\rm OR} = \frac{ P(Y = 1| X = 1) }{P(Y = 0 | X = 1)} \cdot \frac{ P(Y = 0 | X = 0) }{ P(Y = 1 | X = 0)} $$

In logistic regression, you model these probabilities directly:

$$ \log \left( \frac{ P(Y_i = 1|X_i) }{ P(Y_i = 0|X_i) } \right) = \beta_0 + \beta_1 X_i $$

So we can calculate these conditional probabilities directly from the model. The first ratio in the expression for ${\rm OR}$ above is:

$$ \frac{ P(Y_i = 1| X_i = 1) }{P(Y_i = 0 | X_i = 1)} = \frac{ \left( \frac{1}{1 + e^{-(\beta_0+\beta_1)}} \right) } {\left( \frac{e^{-(\beta_0+\beta_1)}}{1 + e^{-(\beta_0+\beta_1)}}\right)} = \frac{1}{e^{-(\beta_0+\beta_1)}} = e^{(\beta_0+\beta_1)} $$

and the second is:

$$ \frac{ P(Y_i = 0| X_i = 0) }{P(Y_i = 1 | X_i = 0)} = \frac{ \left( \frac{e^{-\beta_0}}{1 + e^{-\beta_0}} \right) } { \left( \frac{1}{1 + e^{-\beta_0}} \right) } = e^{-\beta_0}$$

plugging this back into the formula, we have ${\rm OR} = e^{(\beta_0+\beta_1)} \cdot e^{-\beta_0} = e^{\beta_1}$, which is the result.

Note: When you have other predictors, call them $Z_1, ..., Z_p$, in the model, the exponentiated regression coefficient (using a similar derivation) is actually

$$ \frac{ P(Y = 1| X = 1, Z_1, ..., Z_p) }{P(Y = 0 | X = 1, Z_1, ..., Z_p)} \cdot \frac{ P(Y = 0 | X = 0, Z_1, ..., Z_p) }{ P(Y = 1 | X = 0, Z_1, ..., Z_p)} $$

so it is the odds ratio conditional on the values of the other predictors in the model and, in general, in not equal to

$$ \frac{ P(Y = 1| X = 1) }{P(Y = 0 | X = 1)} \cdot \frac{ P(Y = 0 | X = 0) }{ P(Y = 1 | X = 0)}$$

So, it is no surprise that you're observing a discrepancy between the exponentiated coefficient and the observed odds ratio.

Note 2: I derived a relationship between the true $\beta$ and the true odds ratio but note that the same relationship holds for the sample quantities since the fitted logistic regression with a single binary predictor will exactly reproduce the entries of a two-by-two table. That is, the fitted means exactly match the sample means, as with any GLM. So, all of the logic used above applies with the true values replaced by sample quantities.

Solved – Calculating risk ratio using odds ratio from logistic regression coefficient

Zhang 1998 originally presented a method for calculating CIs for risk ratios suggesting you could use the lower and upper bounds of the CI for the odds ratio.

This method does not work, it is biased and generally produces anticonservative (too tight) estimates of the risk ratio 95% CI. This is because of the correlation between the intercept term and the slope term as you correctly allude to. If the odds ratio tends towards its lower value in the CI, the intercept term increases to account for a higher overall prevalence in those with a 0 exposure level and conversely for a higher value in the CI. Each of these respectively lead to lower and higher bounds for the CI.

To answer your question outright, you need a knowledge of the baseline prevalence of the outcome to obtain correct confidence intervals. Data from case-control studies would rely on other data to inform this.

Alternately, you can use the delta method if you have the full covariance structure for the parameter estimates. An equivalent parametrization for the OR to RR transformation (having binary exposure and a single predictor) is:

$$RR = \frac{1 + \exp(-\beta_0)}{1+\exp(-\beta_0-\beta_1)}$$

And using multivariate delta method, and the central limit theorem which states that $\sqrt{n} \left( [\hat{\beta}_0, \hat{\beta}_1] - [\beta_0, \beta_1]\right) \rightarrow_D \mathcal{N} \left(0, \mathcal{I}^{-1}(\beta)\right)$, you can obtain the variance of the approximate normal distribution of the $RR$.

Note, notationally this only works for binary exposure and univariate logistic regression. There are some simple R tricks that make use of the delta method and marginal standardization for continuous covariates and other adjustment variables. But for brevity I'll not discuss that here.

However, there are several ways to compute relative risks and its standard error directly from models in R. Two examples of this below:

x <- sample(0:1, 100, replace=T)
y <- rbinom(100, 1, x*.2+.2)
glm(y ~ x, family=binomial(link=log))
library(survival)
coxph(Surv(time=rep(1,100), event=y) ~ x)

http://research.labiomed.org/Biostat/Education/Case%20Studies%202005/Session4/ZhangYu.pdf

Best Answer

Related Solutions

Solved – Exponentiated logistic regression coefficient different than odds ratio

Solved – Calculating risk ratio using odds ratio from logistic regression coefficient

Related Question