Logistic Regression – Interpretation of Predictions to Odds Ratios

interpretationlogisticodds-ratiopredictionregression

I'm somewhat new to using logistic regression, and a bit confused by a discrepancy between my interpretations of the following values which I thought would be the same:

  • exponentiated beta values
  • predicted probability of the outcome using beta values.

Here is a simplified version of the model I am using, where undernutrition and insurance are both binary, and wealth is continuous:

Under.Nutrition ~ insurance + wealth

My (actual) model returns an exponentiated beta value of .8 for insurance, which I would interpret as:

"The probability of being undernourished for an insured individual is .8 times the probability of being undernourished for an uninsured individual."

However, when I calculate the difference in probabilities for individuals by putting in values of 0 and 1 into the insurance variable and the mean value for wealth, the difference in undernutrition is only .04. That is calculated as follows:

Probability Undernourished = exp(β0 + β1*Insurance + β2*Wealth) /
                             (1+exp(β0 + β1*Insurance + β2*wealth))

I would really appreciate it if someone could explain why these values are different, and what a better interpretation (particularly for the second value) might be.


Further Clarification Edits
As I understand it, the probability of being under-nourished for an uninsured person (where B1 corresponds to insurance) is:

Prob(Unins) = exp(β0 + β1*0 + β2*Wealth) /
              (1+exp(β0 + β1*0+ β2*wealth))

While the Probability of being under-nourished for an insured person is:

Prob(Ins)= exp(β0 + β1*1 + β2*Wealth) /
           (1+exp(β0 + β1*1+ β2*wealth))

The odds of being undernourished for an uninsured person compared to an insured person is:

exp(B1)

Is there a way to translate between these values (mathematically)? I'm still a bit confused by this equation (where I should probably be a different value on the RHS):

Prob(Ins) - Prob(Unins) != exp(B)

In layman's terms, the question is why doesn't insuring an individual change their probability of being under-nourished as much as the odds ratio indicates it does? In my data, Prob(Ins) – Prob(Unins) = .04, where the exponentiated beta value is .8 (so why is the difference not .2?)

Best Answer

It seems self-evident to me that $$ \exp(\beta_0 + \beta_1x) \neq\frac{\exp(\beta_0 + \beta_1x)}{1+\exp(\beta_0 + \beta_1x)} $$ unless $\exp(\beta_0 + \beta_1x)=0$. So, I'm less clear about what the confusion might be. What I can say is that the left hand side (LHS) of the (not) equals sign is the odds of being undernourished, whereas the RHS is the probability of being undernourished. When examined on its own, $\exp(\beta_1)$, is the odds ratio, that is the multiplicative factor that allows you to move from the odds($x$) to the odds($x+1$).

Let me know if you need additional / different information.

Update:
I think this is mostly an issue of being unfamiliar with probabilities and odds, and how they relate to one another. None of that is very intuitive, you need to sit down and work with it for a while and learn to think in those terms; it doesn't come naturally to anyone.

The issue is that absolute numbers are very difficult to interpret on their own. Lets say I was telling you about a time when I had a coin and I wondered whether it was fair. So I flipped it some and got 6 heads. What does that mean? Is 6 a lot, a little, about right? It's awfully hard to say. To deal with this issue we want to give numbers some context. In a case like this there are two obvious choices for how to provide the needed context: I could give the total number of flips, or I could give the number of tails. In either case, you have adequate information to make sense of 6 heads, and you could compute the other value if the one I told you wasn't the one you preferred. Probability is the number of heads divided by the total number of events. The odds is the ratio of the number of heads to the number of non-heads (intuitively we want to say the number of tails, which works in this case, but not if there are more than 2 possibilities). With the odds, it is possible to give both numbers, e.g. 4 to 5. This means that in the long run something will happen 4 times for every 5 times it doesn't happen. When the odds are presented this way, they're called "Las Vegas odds". However in statistics, we typically divide through and say the odds are .8 instead (i.e., 4/5 = .8) for purposes of standardization. We can also convert between the odds and probabilities: $$ \text{probability}=\frac{\text{odds}}{1+\text{odds}} ~~~~~~~~~~~~~~~~ \text{odds}=\frac{\text{probability}}{1-\text{probability}} $$ (With these formulas it can be difficult to recognize that the odds is the LHS at top, and the probability is the RHS, but remember that it's the not equals sign in the middle.) An odds ratio is just the odds of something divided by the odds of something else; in the context of logistic regression, each $\exp(\beta)$ is the ratio of the odds for successive values of the associated covariate when all else is held equal.

What's important to recognize from all of these equations is that probabilities, odds, and odds ratios do not equate in any straightforward way; just because the probability goes up by .04 very much does not imply that the odds or odds ratio should be anything like .04! Moreover, probabilities range from $[0, 1]$, whereas ln odds (the output from the raw logistic regression equation) can range from $(-\infty, +\infty)$, and odds and odds ratios can range from $(0, +\infty)$. This last part is vital: Due to the bounded range of probabilities, probabilities are non-linear, but ln odds can be linear. That is, as (for example) wealth goes up by constant increments, the probability of undernourishment will increase by varying amounts, but the ln odds will increase by a constant amount and the odds will increase by a constant multiplicative factor. For any given set of values in your logistic regression model, there may be some point where $$ \exp(\beta_0 + \beta_1x)-\exp(\beta_0 + \beta_1x') =\frac{\exp(\beta_0 + \beta_1x)}{1+\exp(\beta_0 + \beta_1x)}-\frac{\exp(\beta_0 + \beta_1x')}{1+\exp(\beta_0 + \beta_1x')} $$ for some given $x$ and $x'$, but it will be unequal everywhere else.

(Although it was written in the context of a different question, my answer here contains a lot of information about logistic regression that may be helpful for you in understanding LR and related issues more fully.)