Solved – Calculating risk ratio using odds ratio from logistic regression coefficient

generalized linear modellogisticodds-ratiorrelative-risk

I have a binary logistic regression with just one binary fixed factor predictor. The reason I don't do it as a Chi square or Fisher's exact test is that I also have a number of random factors (there are multiple data points per individual and individuals are in groups, although I don't care about coefficients or significances for those random variables). I do this with R glmer.

I would like to be able to express the coefficient and associated confidence interval for the predictor as a risk ratio rather than an odds ratio. This is because (maybe not for you but for my audience) risk ratio is much easier to understand. Risk ratio here is the relative increase in chance of the outcome being 1 rather than 0 if the predictor is 1 rather than 0.

The odds ratio is trivial to get from the coefficient and associated CI using exp(). To convert an odds ratio to a risk ratio, you can use "RR = OR / (1 – p + (p x OR)), where p is the risk in the control group" (source: http://www.r-bloggers.com/how-to-convert-odds-ratios-to-relative-risks/). But, you need the risk in the control group, which in my case means the chance that the outcome is 1 if the predictor is 0. I believe the intercept coefficient from the model is in fact the odds for this chance, so I can use prob=odds/(odds+1) to get that. I'm pretty much so-far-so-good on this as far as the central estimate for the risk ratio goes. But what worries me is the associated confidence interval, because the intercept coefficient also has its own associated CI. Should I use the central estimate of the intercept, or to be conservative should I use whatever limits of the intercept CI make my relative risk CI widest? Or am I barking up the wrong tree entirely?

Best Answer

Zhang 1998 originally presented a method for calculating CIs for risk ratios suggesting you could use the lower and upper bounds of the CI for the odds ratio.

This method does not work, it is biased and generally produces anticonservative (too tight) estimates of the risk ratio 95% CI. This is because of the correlation between the intercept term and the slope term as you correctly allude to. If the odds ratio tends towards its lower value in the CI, the intercept term increases to account for a higher overall prevalence in those with a 0 exposure level and conversely for a higher value in the CI. Each of these respectively lead to lower and higher bounds for the CI.

To answer your question outright, you need a knowledge of the baseline prevalence of the outcome to obtain correct confidence intervals. Data from case-control studies would rely on other data to inform this.

Alternately, you can use the delta method if you have the full covariance structure for the parameter estimates. An equivalent parametrization for the OR to RR transformation (having binary exposure and a single predictor) is:

$$RR = \frac{1 + \exp(-\beta_0)}{1+\exp(-\beta_0-\beta_1)}$$

And using multivariate delta method, and the central limit theorem which states that $\sqrt{n} \left( [\hat{\beta}_0, \hat{\beta}_1] - [\beta_0, \beta_1]\right) \rightarrow_D \mathcal{N} \left(0, \mathcal{I}^{-1}(\beta)\right)$, you can obtain the variance of the approximate normal distribution of the $RR$.

Note, notationally this only works for binary exposure and univariate logistic regression. There are some simple R tricks that make use of the delta method and marginal standardization for continuous covariates and other adjustment variables. But for brevity I'll not discuss that here.

However, there are several ways to compute relative risks and its standard error directly from models in R. Two examples of this below:

x <- sample(0:1, 100, replace=T)
y <- rbinom(100, 1, x*.2+.2)
glm(y ~ x, family=binomial(link=log))
library(survival)
coxph(Surv(time=rep(1,100), event=y) ~ x)

http://research.labiomed.org/Biostat/Education/Case%20Studies%202005/Session4/ZhangYu.pdf

Related Question