Solved – Poisson regression to estimate relative risk for binary outcomes

epidemiologylogisticodds-ratiopoisson distributionrelative-risk

Brief Summary

Why is it more common for logistic regression (with odds ratios) to be used in cohort studies with binary outcomes, as opposed to Poisson regression (with relative risks)?

Background

Undergraduate and graduate statistics and epidemiology courses, in my experience, generally teach that logistic regression should be used for modelling data with binary outcomes, with risk estimates reported as odds ratios.

However, Poisson regression (and related: quasi-Poisson, negative binomial, etc.) can also be used to model data with binary outcomes and, with appropriate methods (e.g. robust sandwich variance estimator), it provides valid risk estimates and confidence levels. E.g.,

From Poisson regression, relative risks can be reported, which some have argued are easier to interpret compared with odds ratios, especially for frequent outcomes, and especially by individuals without a strong background in statistics. See Zhang J. and Yu K.F., What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes, JAMA. 1998 Nov 18;280(19):1690-1.

From reading the medical literature, among cohort studies with binary outcomes it seems that it is still far more common to report odds ratios from logistic regressions rather than relative risks from Poisson regressions.

Questions

For cohort studies with binary outcomes:

  1. Is there good reason to report odds ratios from logistic regressions rather than relative risks from Poisson regressions?
  2. If not, can the infrequency of Poisson regressions with relative risks in the medical literature be attributed mostly to a lag between methodological theory and practice among scientists, clinicians, statisticians, and epidemiologists?
  3. Should intermediate statistics and epidemiology courses include more discussion of Poisson regression for binary outcomes?
  4. Should I be encouraging students and colleagues to consider Poisson regression over logistic regression when appropriate?

Best Answer

An answer to all four of your questions, preceeded by a note:

It's not actually all that common for modern epidemiology studies to report an odds ratio from a logistic regression for a cohort study. It remains the regression technique of choice for case-control studies, but more sophisticated techniques are now the de facto standard for analysis in major epidemiology journals like Epidemiology, AJE or IJE. There will be a greater tendency for them to show up in clinical journals reporting the results of observational studies. There's also going to be some problems because Poisson regression can be used in two contexts: What you're referring to, wherein it's a substitute for a binomial regression model, and in a time-to-event context, which is extremely common for cohort studies. More details in the particular question answers:

  1. For a cohort study, not really no. There are some extremely specific cases where say, a piecewise logistic model may have been used, but these are outliers. The whole point of a cohort study is that you can directly measure the relative risk, or many related measures, and don't have to rely on an odds ratio. I will however make two notes: A Poisson regression is estimating often a rate, not a risk, and thus the effect estimate from it will often be noted as a rate ratio (mainly, in my mind, so you can still abbreviate it RR) or an incidence density ratio (IRR or IDR). So make sure in your search you're actually looking for the right terms: there are many cohort studies using survival analysis methods. For these studies, Poisson regression makes some assumptions that are problematic, notably that the hazard is constant. As such it is much more common to analyze a cohort study using Cox proportional hazards models, rather than Poisson models, and report the ensuing hazard ratio (HR). If pressed to name a "default" method with which to analyze a cohort, I'd say epidemiology is actually dominated by the Cox model. This has its own problems, and some very good epidemiologists would like to change it, but there it is.

  2. There are two things I might attribute the infrequency to - an infrequency I don't necessarily think exists to the extent you suggest. One is that yes - "epidemiology" as a field isn't exactly closed, and you get huge numbers of papers from clinicians, social scientists, etc. as well as epidemiologists of varying statistical backgrounds. The logistic model is commonly taught, and in my experience many researchers will turn to the familiar tool over the better tool.

    The second is actually a question of what you mean by "cohort" study. Something like the Cox model, or a Poisson model, needs an actual estimate of person-time. It's possible to get a cohort study that follows a somewhat closed population for a particular period - especially in early "Intro to Epi" examples, where survival methods like Poisson or Cox models aren't so useful. The logistic model can be used to estimate an odds ratio that, with sufficiently low disease prevalence, approximates a relative risk. Other regression techniques that directly estimate it, like binomial regression, have convergence issues that can easily derail a new student. Keep in mind the Zou papers you cite are both using a Poisson regression technique to get around the convergence issues of binomial regression. But binomial-appropriate cohort studies are actually a small slice of the "cohort study pie".

  3. Yes. Frankly, survival analysis methods should come up earlier than they often do. My pet theory is that the reason this isn't so is that methods like logistic regression are easier to code. Techniques that are easier to code, but come with much larger caveats about the validity of their effect estimates, are taught as the "basic" standard, which is a problem.

  4. You should be encouraging students and colleagues to use the appropriate tool. Generally for the field, I think you'd probably be better off suggesting a consideration of the Cox model over a Poisson regression, as most reviewers would (and should) swiftly bring up concerns about the assumption of a constant hazard. But yes, the sooner you can get them away from "How do I shoehorn my question into a logistic regression model?" the better off we'll all be. But yes, if you're looking at a study without time, students should be introduced to both binomial regression, and alternative approaches, like Poisson regression, which can be used in case of convergence problems.