Solved – How to estimate Relative Risks in Multivariate Binary Logistic Regression Models, instead of Odds Ratios

logisticmultivariate analysisodds-ratiorelative-risk

All software programs I have tried report only odds ratios (ORs) for binary logistic regression predictors (as exponential of the betas).

I am interested to know how can I compute the relative risk (RR) from a binary logistic regression model?

My reason is that RRs (besides ORs) would make my variables more understandable. Also I see many journals (even many of the top ones) confuse these two, and I want to hopefully have a good report with scientific merit, and not repeating previous mistakes.

I found some methods over the net (which I have not tried yet). Here and Here for example.

However in a forum, I saw some experts had opposed reporting RRs for logistic regressions. For example:

"It is incorrect to use the relative risk as a measure of association in a logistic regression. The measure of association in a logistic regression is the odds ratio. The odds ratio is an approximation of the relative risk. The approximation becomes progressively better as the disease becomes progressively rarer. Regardless of whether the disease is rare or not, inferences drawn from a logistic regression are valid. Please do not report a logistic regression using relative risk. It is not correct to do so." –John Sorkin

"I am curious why one would want risk ratios. Unlike odds ratios, they are not interpretable without reference to the base risk. For example a risk ratio of 2 cannot possibly apply to anyone with a starting risk exceeding 1/2." –Frank Harrell

So my questions are:

Is it a good way to estimate RR for binary logistic regression? Or do you agree with the above quotes, disagreeing with RR for logistic regressions?
Could you please let me know why, if you agree or disagree?
Do you know of any implemented algorithms (e.g., a macro or an R function) which can do it without needing to manually computing it?

Best Answer

Here is more or less the replication of the SAS example mentioned in the comments to the question.

library("sas7bdat")
eyestudy =read.sas7bdat("eyestudy.sas7bdat")
sapply(1:ncol(eyestudy),function(z)summary(eyestudy[,z]))
for(i in c(2:3))eyestudy[,i]<-as.factor(eyestudy[,i])

tabfq=with(eyestudy, table(carrot, lenses))
(tabfqm=addmargins(tabfq))
prop.table(tabfq, 1)
#OR =  (32/17)/(21/30)  = 2.69
#RR =   (32/49)/(21/51)  = 1.59
(OR=tabfqm[1,2]/tabfqm[1,1]/(tabfqm[2,2]/tabfqm[2,1]))
(RR=tabfqm[1,2]/tabfqm[1,3]/(tabfqm[2,2]/tabfqm[2,3]))

#logit
(ml1<-glm(lenses~carrot, data=eyestudy,family =binomial(link = "logit")))
summary(ml1)
exp(-coefficients(ml1)) #OR
exp(-cbind(coef(ml1), confint(ml1)))  
(ml10 <- glm(lenses~1, data=eyestudy,family =binomial(link = "logit")))
anova(ml10, ml1, test="Chisq")

#log
(ml2<-glm(lenses~carrot, data=eyestudy,family =binomial(link = "log")))
summary(ml2)
exp(-coefficients(ml2)) #RR
exp(-cbind(coef(ml2), confint(ml2)))  

#poisson
(ml3<-glm(lenses~carrot, data=eyestudy,family =poisson(link = "log")))
summary(ml3)
exp(-coefficients(ml3)) #RR
exp(-cbind(coef(ml3), confint(ml3)))  

#poisson 2
(ml4<-glm(lenses~carrot+gender+latitude, data=eyestudy,family =poisson(link = "log")))
summary(ml4)
exp(-coefficients(ml4)) #RR
exp(-cbind(coef(ml4), confint(ml4)))

Related Solutions

Solved – Poisson regression to estimate relative risk for binary outcomes

An answer to all four of your questions, preceeded by a note:

It's not actually all that common for modern epidemiology studies to report an odds ratio from a logistic regression for a cohort study. It remains the regression technique of choice for case-control studies, but more sophisticated techniques are now the de facto standard for analysis in major epidemiology journals like Epidemiology, AJE or IJE. There will be a greater tendency for them to show up in clinical journals reporting the results of observational studies. There's also going to be some problems because Poisson regression can be used in two contexts: What you're referring to, wherein it's a substitute for a binomial regression model, and in a time-to-event context, which is extremely common for cohort studies. More details in the particular question answers:

For a cohort study, not really no. There are some extremely specific cases where say, a piecewise logistic model may have been used, but these are outliers. The whole point of a cohort study is that you can directly measure the relative risk, or many related measures, and don't have to rely on an odds ratio. I will however make two notes: A Poisson regression is estimating often a rate, not a risk, and thus the effect estimate from it will often be noted as a rate ratio (mainly, in my mind, so you can still abbreviate it RR) or an incidence density ratio (IRR or IDR). So make sure in your search you're actually looking for the right terms: there are many cohort studies using survival analysis methods. For these studies, Poisson regression makes some assumptions that are problematic, notably that the hazard is constant. As such it is much more common to analyze a cohort study using Cox proportional hazards models, rather than Poisson models, and report the ensuing hazard ratio (HR). If pressed to name a "default" method with which to analyze a cohort, I'd say epidemiology is actually dominated by the Cox model. This has its own problems, and some very good epidemiologists would like to change it, but there it is.
There are two things I might attribute the infrequency to - an infrequency I don't necessarily think exists to the extent you suggest. One is that yes - "epidemiology" as a field isn't exactly closed, and you get huge numbers of papers from clinicians, social scientists, etc. as well as epidemiologists of varying statistical backgrounds. The logistic model is commonly taught, and in my experience many researchers will turn to the familiar tool over the better tool.

The second is actually a question of what you mean by "cohort" study. Something like the Cox model, or a Poisson model, needs an actual estimate of person-time. It's possible to get a cohort study that follows a somewhat closed population for a particular period - especially in early "Intro to Epi" examples, where survival methods like Poisson or Cox models aren't so useful. The logistic model can be used to estimate an odds ratio that, with sufficiently low disease prevalence, approximates a relative risk. Other regression techniques that directly estimate it, like binomial regression, have convergence issues that can easily derail a new student. Keep in mind the Zou papers you cite are both using a Poisson regression technique to get around the convergence issues of binomial regression. But binomial-appropriate cohort studies are actually a small slice of the "cohort study pie".
Yes. Frankly, survival analysis methods should come up earlier than they often do. My pet theory is that the reason this isn't so is that methods like logistic regression are easier to code. Techniques that are easier to code, but come with much larger caveats about the validity of their effect estimates, are taught as the "basic" standard, which is a problem.
You should be encouraging students and colleagues to use the appropriate tool. Generally for the field, I think you'd probably be better off suggesting a consideration of the Cox model over a Poisson regression, as most reviewers would (and should) swiftly bring up concerns about the assumption of a constant hazard. But yes, the sooner you can get them away from "How do I shoehorn my question into a logistic regression model?" the better off we'll all be. But yes, if you're looking at a study without time, students should be introduced to both binomial regression, and alternative approaches, like Poisson regression, which can be used in case of convergence problems.

Solved – Calculating risk ratio using odds ratio from logistic regression coefficient

Zhang 1998 originally presented a method for calculating CIs for risk ratios suggesting you could use the lower and upper bounds of the CI for the odds ratio.

This method does not work, it is biased and generally produces anticonservative (too tight) estimates of the risk ratio 95% CI. This is because of the correlation between the intercept term and the slope term as you correctly allude to. If the odds ratio tends towards its lower value in the CI, the intercept term increases to account for a higher overall prevalence in those with a 0 exposure level and conversely for a higher value in the CI. Each of these respectively lead to lower and higher bounds for the CI.

To answer your question outright, you need a knowledge of the baseline prevalence of the outcome to obtain correct confidence intervals. Data from case-control studies would rely on other data to inform this.

Alternately, you can use the delta method if you have the full covariance structure for the parameter estimates. An equivalent parametrization for the OR to RR transformation (having binary exposure and a single predictor) is:

$$RR = \frac{1 + \exp(-\beta_0)}{1+\exp(-\beta_0-\beta_1)}$$

And using multivariate delta method, and the central limit theorem which states that $\sqrt{n} \left( [\hat{\beta}_0, \hat{\beta}_1] - [\beta_0, \beta_1]\right) \rightarrow_D \mathcal{N} \left(0, \mathcal{I}^{-1}(\beta)\right)$, you can obtain the variance of the approximate normal distribution of the $RR$.

Note, notationally this only works for binary exposure and univariate logistic regression. There are some simple R tricks that make use of the delta method and marginal standardization for continuous covariates and other adjustment variables. But for brevity I'll not discuss that here.

However, there are several ways to compute relative risks and its standard error directly from models in R. Two examples of this below:

x <- sample(0:1, 100, replace=T)
y <- rbinom(100, 1, x*.2+.2)
glm(y ~ x, family=binomial(link=log))
library(survival)
coxph(Surv(time=rep(1,100), event=y) ~ x)

http://research.labiomed.org/Biostat/Education/Case%20Studies%202005/Session4/ZhangYu.pdf

Best Answer

Related Solutions

Solved – Poisson regression to estimate relative risk for binary outcomes

Solved – Calculating risk ratio using odds ratio from logistic regression coefficient

Related Question