Solved – Help me understand adjusted odds ratio in logistic regression

logisticodds-ratio

I've been having a hard time trying to understand the use of logistic regression in a paper. The paper available here uses logistic regression to predict probability of complications during cataract surgery.

What is confusing me is that the paper presents a model that assigns odds ratio of 1 to baseline described as follows:

A patient whose risk profile was in the reference group for all risk indicators (ie adjusted OR=1.00 for all in Table 1) may be regarded as having a ‘baseline risk profile’, and the logistic regression model indicates a ‘baseline predicted probability’ for PCR or VL or both=0.736%.

So the probability of 0.00736 is presented with odds ratio of 1. Based on the transformation from probabilities to odds ratios: $o=\frac{p}{1-p}$, this can't be equal to 1: $0.00741=\frac{0.00736}{1-0.00736}$.

It gets even more confusing. The composite odds ratios that represents multiple covariates having values different than the baseline is used to calculate predicted risk.

…the composite OR from Table 1 would be 1.28 X 1.58 X 2.99 X 2.46 X
1.45 X 1.60 = 34.5, and from the graph in Figure 1, we see that this OR corresponds with a predicted probability of PCR or VL or both of around 20%

The only way to arrive at the values the paper is giving as examples is to multiply the baseline probability with composite odds like this:
$0.2025=\frac{(34.50\ \times\ 0.00736)}{1\ +\ (34.50\ \times\ 0.00736)}$.

So what is going on here? What is the logic for assigning odds ratio 1 to a baseline probability that is not 0.5? The update formula I've came up with above comes up with the right probabilities for examples in the paper but this is not the direct multiplication of odds ratio I'd expect. What is it then?

Best Answer

Odds are a way to express chances. Odds ratios are just that: one odds divided by another. That means an odds ratio is what you multiply one odds by to produce another. Let's see how they work in this common situation.

Converting between odds and probability

The odds of a binary response $Y$ are the ratio of the chance it happens (coded with $1$), written $\Pr(Y=1)$, to the chance it does not (coded with $0$), written $\Pr(Y=0)$:

$$\text{Odds}(Y) = \frac{\Pr(Y=1)}{\Pr(Y=0)} = \frac{\Pr(Y=1)}{1 - \Pr(Y=1)}.$$

The equivalent expression on the right shows it suffices to model $\Pr(Y=1)$ to find the odds. Conversely, note that we can solve

$$\Pr(Y=1) = \frac{\text{Odds}(Y)}{1 + \text{Odds}(Y)} = 1 - \frac{1}{1 + \text{Odds}(Y)}.$$

Logistic regression

Logistic regression models the logarithm of the odds of $Y$ as a linear function of explanatory variables. Most generally, writing these variables as $x_1, \ldots, x_p$, and including a possible constant term in the linear function, we may name the coefficients (which are to be estimated from the data) as $\beta_1,\ldots, \beta_p$ and $\beta_0$. Formally this produces the model

$$\log\left(\text{Odds}(Y)\right) = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p.$$

The odds themselves can be recovered by undoing the logarithm:

$$\text{Odds}(Y) = \exp(\beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p).$$

Using categorical variables

Categorical variables, such as age group, gender, presence of Glaucoma, etc., are incorporated by means of "dummy coding." To show that how the variable is coded does not matter, I will provide a simple example of one small group; its generalization to multiple groups should be obvious. In this study one variable is "pupil size," with three categories, "Large", "Medium", and "Small". (The study treats these as purely categorical, apparently paying no attention to their inherent order.) Intuitively, each category has its own odds, say $\alpha_L$ for "Large", $\alpha_M$ for "Medium", and $\alpha_S$ for "Small". This means that, all other things equal,

$$\text{Odds}(Y) = \exp(\color{Blue}{\alpha_L + \beta_0} + \beta_1 x_1 + \cdots + \beta_p x_p)$$

for anybody in the "Large" category,

$$\text{Odds}(Y) = \exp(\color{Blue}{\alpha_M + \beta_0} + \beta_1 x_1 + \cdots + \beta_p x_p)$$

for anybody in the "Medium" category, and

$$\text{Odds}(Y) = \exp(\color{Blue}{\alpha_S + \beta_0} + \beta_1 x_1 + \cdots + \beta_p x_p)$$

for those in the "Small" category.

Creating identifiable coefficients

I have colored the first two coefficients to highlight them, because I want you to notice that they allow a simple change to occur: we could pick any number $\gamma$ and, by adding it to $\beta_0$ and subtracting it from each of $\alpha_L$, $\alpha_M$, and $\alpha_S$, we would not change any predicted odds. This is because of the obvious equivalences of the form

$$\alpha_L + \beta_0 = (\alpha_L - \gamma) + (\gamma + \beta_0 ),$$

etc. Although this presents no problems for the model--it still predicts exactly the same things--it shows that the parameters are not in themselves interpretable. What stays the same when we do this addition-subtraction maneuver are the differences between the coefficients. Conventionally, to address this lack of identifiability, people (and by default, software) choose one of the categories in each variable as the "base" or "reference" and simply stipulate that its coefficient will be zero. This removes the ambiguity.

The paper lists reference categories first; "Large" in this case. Thus, $\alpha_L$ is subtracted from each of $\alpha_L, \alpha_M,$ and $\alpha_S$, and added to $\beta_0$ to compensate.

The log odds for a hypothetical individual falling into all the base categories therefore equals $\beta_0$ plus a bunch of terms associated with all other "covariates"--the non-categorical variables:

$$\text{Odds(Base category)} = \exp(\beta_0 + \beta_1X_1 + \cdots + \beta_p X_p).$$

No terms associated with any categorical variables appear here. (I have slightly changed the notation at this point: the betas $\beta_i$ now are the coefficients only of the covariates, while the full model includes the alphas $\alpha_j$ for the various categories.)

Comparing odds

Let us compare odds. Suppose a hypothetical individual is a

male patient aged 80–89 with a white cataract, no fundal view, and a small pupil being operated on by a specialist registrar, ...

Associated with this patient (let's call him Charlie) are estimated coefficients for each category: $\alpha_\text{80-89}$ for his age group, $\alpha_\text{male}$ for being male, and so on. Wherever his attribute is the base for its category, the coefficient is zero by convention, as we have seen. Because this is a linear model, the coefficients add. Thus, to the base log odds given above, the log odds for this patient are obtained by adding in

$$\alpha_\text{80-89}+\alpha_\text{male}+\alpha_\text{no Glaucoma}+ \cdots + \alpha_\text{specialist registrar}.$$

This is precisely the amount by which the log odds of this patient vary from the base. To convert from log odds, undo the logarithm and recall that this turns addition into multiplication. Therefore, the base odds must be multiplied by

$$\exp(\alpha_\text{80-89})\exp(\alpha_\text{male})\exp(\alpha_\text{no Glaucoma}) \cdots \exp(\alpha_\text{specialist registrar}).$$

These are the numbers given in the table under "Adjusted OR" (adjusted odds ratio). (It is called "adjusted" because covariates $x_1, \ldots, x_p$ were included in the model. They play no role in any of our calculations, as you will see. It is called a "ratio" because it is precisely the amount by which the base odds must be multiplied to produce the patient's predicted odds: see the first paragraph of this post.) In order in the table, they are $\exp(\alpha_\text{80-89})=1.58$, $\exp(\alpha_\text{male})=1.28$, $\exp(\alpha_\text{no Glaucoma})=1.00$, and so on. According to the article, their product works out to $34.5$. Therefore

$$\text{Odds(Charlie)} = 34.5\times \text{Odds(Base)}.$$

(Notice that the base categories all have odds ratios of $1.00=\exp(0)$, because including $1$ in the product leaves it unchanged. That's how you can spot the base categories in the table.)

Restating the results as probabilities

Finally, let us convert this result to probabilities. We were told the baseline predicted probability is $0.736\%=0.00736$. Therefore, using the formulas relating odds and probabilities derived at the outset, we may compute

$$\text{Odds(Base)} = \frac{0.00736}{1 - 0.00736} = 0.00741.$$

Consequently Charlie's odds are

$$\text{Odds(Charlie)} = 34.5\times 0.00741 = 0.256.$$

Finally, converting this back to probabilities gives

$$\Pr(Y(\text{Charlie})=1) = 1 - \frac{1}{1 + 0.256} = 0.204.$$

Best Answer

Converting between odds and probability

Logistic regression

Using categorical variables

Creating identifiable coefficients

Comparing odds

Restating the results as probabilities

Related Solutions

Solved – Correct equation for Breslow-Day statistic in homogeneity test of odds ratio

Solved – Logistic regression with multi-level categorical predictors

Related Question