Solved – Odds ratio vs probability ratio

odds-ratio

An odds is the ratio of the probability of an event to its complement:

$$\text{odds}(X) = \frac{P(X)}{1-P(X)}$$

An odds ratio (OR) is the ratio of the odds of an event in one group (say, $A$) versus the odds of an event in another group (say, $B$):

$$\text{OR}(X)_{A\text{ vs }B} = \frac{\frac{P(X|A)}{1-P(X|A)}}{\frac{P(X|B)}{1-P(X|B)}}$$

A probability ratio¹ (PR, aka prevalence ratio) is the ratio of the probability of an event in one group ($A$) versus the probability of an event in another group ($B$):

$$\text{PR}(X)_{A\text{ vs }B} = \frac{P(X|A)}{P(X|B)}$$

An incidence proportion can be thought of as pretty similar to a probability (although technically is a rate of probability occurring over time), and we contrast incidence proportions (and incidence densities, for that matter) using relative risks (aka risk ratios, RR), along with other measures like risk differences:

$$\text{RR}_{A\text{ vs }B} = \frac{\text{incidence proportion}(X|A)}{\text{incidence proportion}(X|B)}$$

Why are relative probability contrasts so often represented using relative odds instead of probability ratios, when risk contrasts are represented using relative risks instead of odds ratios (calculated using incidence proportions instead of probabilities)?

My question is foremost about why prefer ORs to PRs, rather than why not use incidence proportions to calculate a quantity like an OR. Edit: I am aware that risks are sometimes contrasted using a risk odds ratio.

¹ As near as I can tell… I do not actually encounter this term in my discipline other than very rarely.

Best Answer

I think the reason that OR is far more common that PR comes down to the standard ways in which different types of quantity are typically transformed.

When working with normal quantities, like temperature, height, weight, then the standard assumptions is that they are approximately Normal. When you take contrasts between these sorts of quantities, then a good thing to do is take the difference. Equally if you fit a regression model to it you don't expect a systematic change in the variance.

When you are working with quantities that are "rate like", that is they are bounded at zero and typically come from calculating things like "number per day", then taking raw differences is awkward. Since the variance of any sample is proportional to the rate, the residuals of any fit to count or rate data won't generally have constant variance. However, if we work with the log of the mean, then the variances will be "stabilized" – that is they add rather than multiply. Thus for rates we typically handle them as the log. Then when you form contrasts you are taking differences of logs, and that is the same as a ratio.

When you are working with probability like quantities, or fractions of a cake, then you are now bounded above and below. You now also have an arbitrary choice what you code as 1 and 0 (or more in multi-class models). Differences between probabilities are invariant to switching 1 to 0, but have the problem of rates that the variance changes with the mean again. Logging them wouldn't give you invariance for 1s and 0s, so instead we tend to logit them (log-odds). Working with log-odds you are now back on the full real line, the variance is the same all along the line, and differences of log-odds behave a bit like normal quantities.

Gaussian

Variance does not depend on $\mu$
Canonical link for GLM is $x$
Transformation not helpful

Poisson

Variance is proportional to the rate $\lambda$
Canonical link for GLM is $\ln(x)$
Logging should result in residuals of constant variance

Binomial

Variance is proportional to $p(1-p)$
Canonical link for GLM is logit $\ln\left(\frac{p}{1-p}\right)$
Taking logit (log-odds) of data should result in residuals of constant variance

So I think that the reason you see lots of RR, but very little PR is that PR is constructed from probability/Binomial type quantities, while RR is constructed from rate type quantities. In particular note that incidence can exceed 100% if people can catch the disease multiple times per year, but probability can never exceed 100%.

Is odds the only way?

No, the general messages above are just useful rules of thumb, and these "canonical" forms are just convenient mathematically – hence why you tend to see it most. The probit function is used instead for probit regression, so in principle differences of probit would be just as valid as OR. Similarly, despite best efforts to word it carefully, the text above still sort of suggests that logging and logiting your raw data, and then fitting a model to it is a good idea – it's not a terrible idea, but there are better things that you can do (GLM etc.).

Related Solutions

Solved – How to estimate Relative Risks in Multivariate Binary Logistic Regression Models, instead of Odds Ratios

Here is more or less the replication of the SAS example mentioned in the comments to the question.

library("sas7bdat")
eyestudy =read.sas7bdat("eyestudy.sas7bdat")
sapply(1:ncol(eyestudy),function(z)summary(eyestudy[,z]))
for(i in c(2:3))eyestudy[,i]<-as.factor(eyestudy[,i])

tabfq=with(eyestudy, table(carrot, lenses))
(tabfqm=addmargins(tabfq))
prop.table(tabfq, 1)
#OR =  (32/17)/(21/30)  = 2.69
#RR =   (32/49)/(21/51)  = 1.59
(OR=tabfqm[1,2]/tabfqm[1,1]/(tabfqm[2,2]/tabfqm[2,1]))
(RR=tabfqm[1,2]/tabfqm[1,3]/(tabfqm[2,2]/tabfqm[2,3]))

#logit
(ml1<-glm(lenses~carrot, data=eyestudy,family =binomial(link = "logit")))
summary(ml1)
exp(-coefficients(ml1)) #OR
exp(-cbind(coef(ml1), confint(ml1)))  
(ml10 <- glm(lenses~1, data=eyestudy,family =binomial(link = "logit")))
anova(ml10, ml1, test="Chisq")

#log
(ml2<-glm(lenses~carrot, data=eyestudy,family =binomial(link = "log")))
summary(ml2)
exp(-coefficients(ml2)) #RR
exp(-cbind(coef(ml2), confint(ml2)))  

#poisson
(ml3<-glm(lenses~carrot, data=eyestudy,family =poisson(link = "log")))
summary(ml3)
exp(-coefficients(ml3)) #RR
exp(-cbind(coef(ml3), confint(ml3)))  

#poisson 2
(ml4<-glm(lenses~carrot+gender+latitude, data=eyestudy,family =poisson(link = "log")))
summary(ml4)
exp(-coefficients(ml4)) #RR
exp(-cbind(coef(ml4), confint(ml4)))

Solved – Converting Adjusted Odds Ratios to its RR counterpart

You can do this calculation for an adjusted OR (I presume from a logistic regression) to a RR, but the end result may not be useful for your goal of meta-analysis. The essential problem is that the adjusted OR $exp(\beta_1)$ from a logistic regression is not an "average" over the population. And so there's no way to calculate a population average relative risk from a logistic regression OR. Simply using the population baseline risk to convert $exp(\beta_1)$ to an RR will be incorrect.

Instead, you only can calculate relative risks for fixed sets of covariates. Say you have: $$g(Y) = \beta_0 + \beta_1 Treatment + \beta_2 Age + \beta_3 Gender$$ Then $exp(\beta_1)$ represents the multiplicative change in odds given fixed values for $Age$ and $Gender$. You essentially have different $p_0$ for different sets of covariates, so you end up with different relative risks for say, a (40, Female) vs a (30, Male).

Thus unless you're concerned with comparing a very specific set of fixed covariates, this likely isn't useful for meta-analysis. Separating the analysis into those that report RR and those that report OR is probably the best bet, as suggested here.

Best Answer

Related Solutions

Solved – How to estimate Relative Risks in Multivariate Binary Logistic Regression Models, instead of Odds Ratios

Solved – Converting Adjusted Odds Ratios to its RR counterpart

Related Question