In epidemiology, exponentiated coefficients are often reported as odds ratios, relative risks/ incidence rate ratios or hazard ratios. In the analysis of cross-sectional data using Poisson/negative binomial models if we have an exposure time, the exponentiated coefficients may be called relative risks/incidence rate ratios. However, when analyzing cross-sectional data using Poisson/negative binomial models without a defined exposure time, what may we call the exponentiated coefficients?
Solved – What to call exponentiated coefficients from a Poisson/negative binomial regression of cross-sectional data
cross-sectionepidemiologynegative-binomial-distributionpoisson-regression
Related Solutions
Ah, the incident rate ratio, my old friend.
You're correct. If we have a 0/1 variable, an IRR of 0.7 means that those with X = 1 will have 0.7 times the incident events as those with X = 0. If you want the actual number of predicted counts, you'll have to back-track to the unexponentiated model coefficients. Then your expected cases would be:
counts = exp(B0 + B1*X)
, where B0 is the intercept term, B1 is the coefficient for your variable (equal in this example to ~-0.3365) and X is the value of X for whatever group you're trying to calculate this for. I find that's occasionally a useful sanity check to make sure I haven't done something horribly wrong in the model itself.
If you're more familiar with Hazard Ratios from other areas of survival analysis, note that an incidence rate ratio is a hazard ratio, just with a very particular set of assumptions to it - that the hazard is both proportional and constant. It can be interpreted the same way.
An answer to all four of your questions, preceeded by a note:
It's not actually all that common for modern epidemiology studies to report an odds ratio from a logistic regression for a cohort study. It remains the regression technique of choice for case-control studies, but more sophisticated techniques are now the de facto standard for analysis in major epidemiology journals like Epidemiology, AJE or IJE. There will be a greater tendency for them to show up in clinical journals reporting the results of observational studies. There's also going to be some problems because Poisson regression can be used in two contexts: What you're referring to, wherein it's a substitute for a binomial regression model, and in a time-to-event context, which is extremely common for cohort studies. More details in the particular question answers:
For a cohort study, not really no. There are some extremely specific cases where say, a piecewise logistic model may have been used, but these are outliers. The whole point of a cohort study is that you can directly measure the relative risk, or many related measures, and don't have to rely on an odds ratio. I will however make two notes: A Poisson regression is estimating often a rate, not a risk, and thus the effect estimate from it will often be noted as a rate ratio (mainly, in my mind, so you can still abbreviate it RR) or an incidence density ratio (IRR or IDR). So make sure in your search you're actually looking for the right terms: there are many cohort studies using survival analysis methods. For these studies, Poisson regression makes some assumptions that are problematic, notably that the hazard is constant. As such it is much more common to analyze a cohort study using Cox proportional hazards models, rather than Poisson models, and report the ensuing hazard ratio (HR). If pressed to name a "default" method with which to analyze a cohort, I'd say epidemiology is actually dominated by the Cox model. This has its own problems, and some very good epidemiologists would like to change it, but there it is.
There are two things I might attribute the infrequency to - an infrequency I don't necessarily think exists to the extent you suggest. One is that yes - "epidemiology" as a field isn't exactly closed, and you get huge numbers of papers from clinicians, social scientists, etc. as well as epidemiologists of varying statistical backgrounds. The logistic model is commonly taught, and in my experience many researchers will turn to the familiar tool over the better tool.
The second is actually a question of what you mean by "cohort" study. Something like the Cox model, or a Poisson model, needs an actual estimate of person-time. It's possible to get a cohort study that follows a somewhat closed population for a particular period - especially in early "Intro to Epi" examples, where survival methods like Poisson or Cox models aren't so useful. The logistic model can be used to estimate an odds ratio that, with sufficiently low disease prevalence, approximates a relative risk. Other regression techniques that directly estimate it, like binomial regression, have convergence issues that can easily derail a new student. Keep in mind the Zou papers you cite are both using a Poisson regression technique to get around the convergence issues of binomial regression. But binomial-appropriate cohort studies are actually a small slice of the "cohort study pie".Yes. Frankly, survival analysis methods should come up earlier than they often do. My pet theory is that the reason this isn't so is that methods like logistic regression are easier to code. Techniques that are easier to code, but come with much larger caveats about the validity of their effect estimates, are taught as the "basic" standard, which is a problem.
You should be encouraging students and colleagues to use the appropriate tool. Generally for the field, I think you'd probably be better off suggesting a consideration of the Cox model over a Poisson regression, as most reviewers would (and should) swiftly bring up concerns about the assumption of a constant hazard. But yes, the sooner you can get them away from "How do I shoehorn my question into a logistic regression model?" the better off we'll all be. But yes, if you're looking at a study without time, students should be introduced to both binomial regression, and alternative approaches, like Poisson regression, which can be used in case of convergence problems.
Best Answer
In a cross-sectional study, you are almost always getting prevalence data, so as a first step, you could consider these prevalence ratios.
But, it sounds like you are modeling the number of symptoms as your outcome based on some covariates, using Poisson or negative binomial models. So you have something like this model: $log(Symptom Count|Gender) =\beta_{0} +\beta_{1}*Gender$. If we want to compare men (gender = 1, say) and women (gender = 0, say), we might be interested in the ratio: $log(Symptom Count|Male) / log(Symptom Count|Female) = (\beta_{0} +\beta_{1}*1)/(\beta_{0} +\beta_{1}*0)$.
When we exponentiate this comparison, the left-hand-side is the ratio of the count of symptoms in men to the count of symptoms in women, and the right-hand-side is $e^{\beta{1}}$. Which is what you want to know the interpretation of. This is the ratio of the Average Symptom Count in men to the Average Symptom Count in women (and if there are other covariates, it is the adjusted Symptom Count ratio). Say you had a value like 1.2. You could interpret that as "on average, men in our study reported 20% more symptoms in the past 30 days than women, adjusting for ".