R – Calculating Risk Ratio with CI from Odds Ratios

logisticrregressionrelative-risk

I have performed a multiple logistic regression because I wanted to see the association between Death and Cardiovascular disease. I adjusted using age, sex, risk factors.
The result came in ODDS RATIO with CONFIDENCE INTERVALS.

How do I do this using RISK RATIO with CONFIDENCE INTERVAL instead?

If I just do the ratio between exposed and non-exposed then I don't adjusted for age, sex, risk factors anymore and also I don't have confidence interval.

Please help with codes and method
I use R

Best Answer

This is fairly easily done with a marginal effect in R.

However, we should warn you that the design of the study is incredibly important. If the design is a case control (where cases are purposefully oversampled) then the relative risk calculation is biased precisely because the frequency of cases was biased (by design).

Let's assume you ran a cohort study and hence relative risks are indeed allowable. The first thing to understand is that logistic regression can predict the risk conditional on age and sex. Let age be represented by $x$ and sex by $w$. Your model is

$$ p(x, w) = \dfrac{1}{1 + \exp(-(\beta_0 + \beta_1x + \beta_2w))} $$

Here, the $\beta$ are the log odds ratios. So given someone who is $\delta$ years older than some reference patient, the relative risk is

$$ \dfrac{p(x+\delta, w)}{p(x, w)} $$

Note that the relative risk is then going to depend on the denominator, so there will not be a single relative risk, there will be a distribution of them, and the relative risk depends on the combination of age and sex. However, we can calculate an average marginal effect and report a confidence interval for that. Here is how we would do that in R (you will need to install the {marginaleffects} package).

library(tidyverse)
library(marginaleffects)
#> Warning: package 'marginaleffects' was built under R version 4.2.2
set.seed(0)
N <- 250
# Imagine a rescaled age variable
age <- rnorm(N, 0, 1)
sex <- rbinom(N, 1, 0.49)

p <- plogis(-2 + 0.2*age + 0.1*sex)
y <- rbinom(N, 1, p)

# Your model
fit <- glm(y ~ age + sex, family = binomial())


avg_comparisons(
  model=fit,
  # This next part computes the relative risk
  # Else, the risk difference is returned.
  transform_pre = 'lnratioavg',
  transform_post = exp
)
#> 
#>  Term              Contrast Estimate Pr(>|z|) 2.5 % 97.5 %
#>   age mean(+1)                  1.22    0.166 0.920   1.62
#>   sex ln(mean(1) / mean(0))     1.24    0.461 0.701   2.19
#> 
#> Prediction type:  response 
#> Columns: type, term, contrast, estimate, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo

^{Created on 2023-02-24 by the reprex package (v2.0.1)}

It is worth breaking this down. What this output is telling me is that when I increase age by 1 unit (here 1 standard deviation since I've used a rescaled age with 0 mean and standard deviation 1) then the relative risk is 1.22 (or 22% increase to the risk). The 95% CI is also provided. However, that is the AVERAGE. Remember, the relative risk depends on your age and your sex.

Here is a histogram of estimated relative risks for each patient in these data.

comparisons(
  fit,
  variables = 'age',
  transform_pre = 'lnratio',
  transform_post = exp
) %>% 
  as.data.frame() %>% 
  ggplot(aes(estimate)) + 
  geom_histogram()

We can also estimate the relative risk conditional on age and sex at the same time like this

comparisons(
  fit,
  newdata = datagrid(age=-seq(-3, 3, 0.1), sex=0:1),
  variables = 'age',
  transform_pre = 'lnratio',
  transform_post = exp
) %>% 
  as.data.frame() %>% 
  ggplot(aes(age, estimate, color=factor(sex))) + 
  geom_line()

So given the age of a patient and their sex, you can report the relative risk associated with (in this case) a 1 year increase to their age.

Its worth repeating that the validity of these approaches depends almost entirely on the design of the study. All of this is useless if the study is a case control since the baseline risk of the outcome is biased.

Related Solutions

Solved – Converting Adjusted Odds Ratios to its RR counterpart

You can do this calculation for an adjusted OR (I presume from a logistic regression) to a RR, but the end result may not be useful for your goal of meta-analysis. The essential problem is that the adjusted OR $exp(\beta_1)$ from a logistic regression is not an "average" over the population. And so there's no way to calculate a population average relative risk from a logistic regression OR. Simply using the population baseline risk to convert $exp(\beta_1)$ to an RR will be incorrect.

Instead, you only can calculate relative risks for fixed sets of covariates. Say you have: $$g(Y) = \beta_0 + \beta_1 Treatment + \beta_2 Age + \beta_3 Gender$$ Then $exp(\beta_1)$ represents the multiplicative change in odds given fixed values for $Age$ and $Gender$. You essentially have different $p_0$ for different sets of covariates, so you end up with different relative risks for say, a (40, Female) vs a (30, Male).

Thus unless you're concerned with comparing a very specific set of fixed covariates, this likely isn't useful for meta-analysis. Separating the analysis into those that report RR and those that report OR is probably the best bet, as suggested here.

Solved – Calculating risk ratio using odds ratio from logistic regression coefficient

Zhang 1998 originally presented a method for calculating CIs for risk ratios suggesting you could use the lower and upper bounds of the CI for the odds ratio.

This method does not work, it is biased and generally produces anticonservative (too tight) estimates of the risk ratio 95% CI. This is because of the correlation between the intercept term and the slope term as you correctly allude to. If the odds ratio tends towards its lower value in the CI, the intercept term increases to account for a higher overall prevalence in those with a 0 exposure level and conversely for a higher value in the CI. Each of these respectively lead to lower and higher bounds for the CI.

To answer your question outright, you need a knowledge of the baseline prevalence of the outcome to obtain correct confidence intervals. Data from case-control studies would rely on other data to inform this.

Alternately, you can use the delta method if you have the full covariance structure for the parameter estimates. An equivalent parametrization for the OR to RR transformation (having binary exposure and a single predictor) is:

$$RR = \frac{1 + \exp(-\beta_0)}{1+\exp(-\beta_0-\beta_1)}$$

And using multivariate delta method, and the central limit theorem which states that $\sqrt{n} \left( [\hat{\beta}_0, \hat{\beta}_1] - [\beta_0, \beta_1]\right) \rightarrow_D \mathcal{N} \left(0, \mathcal{I}^{-1}(\beta)\right)$, you can obtain the variance of the approximate normal distribution of the $RR$.

Note, notationally this only works for binary exposure and univariate logistic regression. There are some simple R tricks that make use of the delta method and marginal standardization for continuous covariates and other adjustment variables. But for brevity I'll not discuss that here.

However, there are several ways to compute relative risks and its standard error directly from models in R. Two examples of this below:

x <- sample(0:1, 100, replace=T)
y <- rbinom(100, 1, x*.2+.2)
glm(y ~ x, family=binomial(link=log))
library(survival)
coxph(Surv(time=rep(1,100), event=y) ~ x)

http://research.labiomed.org/Biostat/Education/Case%20Studies%202005/Session4/ZhangYu.pdf

Best Answer

Related Solutions

Solved – Converting Adjusted Odds Ratios to its RR counterpart

Solved – Calculating risk ratio using odds ratio from logistic regression coefficient

Related Question