*Odds* are a way to express chances. *Odds ratios* are just that: one odds divided by another. That means an odds ratio is what you multiply one odds by to produce another. Let's see how they work in this common situation.

### Converting between odds and probability

The odds of a binary response $Y$ are the ratio of the chance it happens (coded with $1$), written $\Pr(Y=1)$, to the chance it does not (coded with $0$), written $\Pr(Y=0)$:

$$\text{Odds}(Y) = \frac{\Pr(Y=1)}{\Pr(Y=0)} = \frac{\Pr(Y=1)}{1 - \Pr(Y=1)}.$$

The equivalent expression on the right shows it suffices to model $\Pr(Y=1)$ to find the odds. Conversely, note that we can solve

$$\Pr(Y=1) = \frac{\text{Odds}(Y)}{1 + \text{Odds}(Y)} = 1 - \frac{1}{1 + \text{Odds}(Y)}.$$

### Logistic regression

Logistic regression models the *logarithm* of the odds of $Y$ as a linear function of explanatory variables. Most generally, writing these variables as $x_1, \ldots, x_p$, and including a possible constant term in the linear function, we may name the coefficients (which are to be estimated from the data) as $\beta_1,\ldots, \beta_p$ and $\beta_0$. Formally this produces the model

$$\log\left(\text{Odds}(Y)\right) = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p.$$

The odds themselves can be recovered by undoing the logarithm:

$$\text{Odds}(Y) = \exp(\beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p).$$

### Using categorical variables

Categorical variables, such as age group, gender, presence of Glaucoma, *etc.*, are incorporated by means of "dummy coding." To show that how the variable is coded does not matter, I will provide a simple example of one small group; its generalization to multiple groups should be obvious. In this study one variable is "pupil size," with three categories, "Large", "Medium", and "Small". (The study treats these as purely categorical, apparently paying no attention to their inherent order.) Intuitively, each category has its own odds, say $\alpha_L$ for "Large", $\alpha_M$ for "Medium", and $\alpha_S$ for "Small". This means that, all other things equal,

$$\text{Odds}(Y) = \exp(\color{Blue}{\alpha_L + \beta_0} + \beta_1 x_1 + \cdots + \beta_p x_p)$$

for anybody in the "Large" category,

$$\text{Odds}(Y) = \exp(\color{Blue}{\alpha_M + \beta_0} + \beta_1 x_1 + \cdots + \beta_p x_p)$$

for anybody in the "Medium" category, and

$$\text{Odds}(Y) = \exp(\color{Blue}{\alpha_S + \beta_0} + \beta_1 x_1 + \cdots + \beta_p x_p)$$

for those in the "Small" category.

### Creating identifiable coefficients

I have colored the first two coefficients to highlight them, because I want you to notice that they allow a simple change to occur: we could pick any number $\gamma$ and, by adding it to $\beta_0$ and subtracting it from each of $\alpha_L$, $\alpha_M$, and $\alpha_S$, *we would not change any predicted odds.* This is because of the obvious equivalences of the form

$$\alpha_L + \beta_0 = (\alpha_L - \gamma) + (\gamma + \beta_0 ),$$

*etc.* Although this presents no problems for the model--it still predicts exactly the same things--it shows that the parameters are not in themselves interpretable. What stays the same when we do this addition-subtraction maneuver are the *differences* between the coefficients. Conventionally, to address this *lack of identifiability,* people (and by default, software) choose one of the categories in each variable as the "base" or "reference" and simply stipulate that its coefficient will be zero. This removes the ambiguity.

The paper lists reference categories first; "Large" in this case. Thus, $\alpha_L$ is subtracted from each of $\alpha_L, \alpha_M,$ and $\alpha_S$, and added to $\beta_0$ to compensate.

The log odds for a hypothetical individual falling into all the base categories therefore equals $\beta_0$ plus a bunch of terms associated with all other "covariates"--the non-categorical variables:

$$\text{Odds(Base category)} = \exp(\beta_0 + \beta_1X_1 + \cdots + \beta_p X_p).$$

*No* terms associated with any categorical variables appear here. (I have slightly changed the notation at this point: the betas $\beta_i$ now are the coefficients only of the *covariates*, while the full model includes the alphas $\alpha_j$ for the various categories.)

### Comparing odds

Let us compare odds. Suppose a hypothetical individual is a

male patient aged 80–89 with a white cataract, no fundal view, and a small pupil being operated on by a specialist registrar, ...

Associated with this patient (let's call him Charlie) are estimated coefficients for each category: $\alpha_\text{80-89}$ for his age group, $\alpha_\text{male}$ for being male, and so on. Wherever his attribute is the base for its category, the coefficient is zero *by convention*, as we have seen. Because this is a linear model, *the coefficients add.* Thus, to the base log odds given above, the log odds for this patient are obtained by adding in

$$\alpha_\text{80-89}+\alpha_\text{male}+\alpha_\text{no Glaucoma}+ \cdots + \alpha_\text{specialist registrar}.$$

This is precisely the amount by which the log odds of this patient vary from the base. To convert from log odds, undo the logarithm and recall that this turns addition into multiplication. Therefore, the base odds must be multiplied by

$$\exp(\alpha_\text{80-89})\exp(\alpha_\text{male})\exp(\alpha_\text{no Glaucoma}) \cdots \exp(\alpha_\text{specialist registrar}).$$

These are the numbers given in the table under "Adjusted OR" (adjusted odds ratio). (It is called "adjusted" because covariates $x_1, \ldots, x_p$ were included in the model. They play no role in any of our calculations, as you will see. It is called a "ratio" because it is precisely the amount by which the base odds must be multiplied to produce the patient's predicted odds: see the first paragraph of this post.) In order in the table, they are $\exp(\alpha_\text{80-89})=1.58$, $\exp(\alpha_\text{male})=1.28$, $\exp(\alpha_\text{no Glaucoma})=1.00$, and so on. According to the article, their product works out to $34.5$. Therefore

$$\text{Odds(Charlie)} = 34.5\times \text{Odds(Base)}.$$

(Notice that the base categories all have odds ratios of $1.00=\exp(0)$, because including $1$ in the product leaves it unchanged. That's how you can spot the base categories in the table.)

### Restating the results as probabilities

Finally, let us convert this result to probabilities. We were told the baseline predicted probability is $0.736\%=0.00736$. Therefore, using the formulas relating odds and probabilities derived at the outset, we may compute

$$\text{Odds(Base)} = \frac{0.00736}{1 - 0.00736} = 0.00741.$$

Consequently Charlie's odds are

$$\text{Odds(Charlie)} = 34.5\times 0.00741 = 0.256.$$

Finally, converting this back to probabilities gives

$$\Pr(Y(\text{Charlie})=1) = 1 - \frac{1}{1 + 0.256} = 0.204.$$

## Best Answer

I think I figured out the answer myself after doing a bit of reading so thought of posting it here. It looks like I got little confused.

So as per my post

$O = \frac{P(X)}{1-P(X)}$

So I forgot to take into account the fact that $P(X)$ itself is the probability given by the logistic function:-

$P_\beta(X) = \frac{e^{\beta^TX}}{1 + e^{\beta^TX} }$

So replacing this in in the equation for $O$ we get

$O = \frac{\frac{e^{\beta^TX}}{1 + e^{\beta^TX} }}{1-\frac{e^{\beta^TX}}{1 + e^{\beta^TX} }} = e^{\beta^TX}$

So $e^{\beta^TX}$ is nothing but the odds for the input feature vector $X$ to be of a positive class. And with further algebraic manipulation we can obtain a linear form and the reason for doing this is to be able to interpret the coefficient vector $\beta$ in precise manner. So that algebraic manipulation is basically taking a natural log of the latest form of O ($e^{\beta^TX}$).

i.e.

$ln(O) = ln (e^{\beta^TX}) =\beta^TX $

So the expanded form of $\beta^TX$ is:-

$ln(O) = \beta_0+\beta_1x1+\beta_2x2+...+\beta_nx_n$

So the real use of this, as I have understood it, is to be able to interpret the coefficients easily while keeping the linear form just like in multiple linear regression. So looking at the latest expanded form of $ln(O)$ we can say that a unit increase in $x_i$ causes the log of Odds to increase by $\beta_i$.