Solved – Incidence Rate Ratio (IRR) in R from linear regression using log-transformed data

incidence-rate-ratioleast squarespoisson distribution

I was wondering if it would make sense to calculate IRR for OLS (not poisson), but the OLS is done using log-transformed data?
I have a set of crude death rate data (which I'm still debating whether they are count data (because they are, after all, based on counts) or continuous data (since they are not integers), and I've modeled them using poisson, but then just curious what would happen if I logged the crude rate and then perform a robust linear regression…. but I'd like to compare the two ways via IRR…..

any suggestions welcome, for example, if I really shouldn't log the crude rate to begin with…..
thanks!

Best Answer

Well, if your numerator is directly interpreted as counts, then both the poisson regression and the log transformed outcome linear regression will be consistent for the same parameters. The only discrepancy in this case is exactly how the observations are weighted (see paragraph 2). If your outcome is rates and you have measured (variable) denominators (such as 1-3 $\mu$gs of biopsied tumor, or 1-20 ccs blood), you need to use some alternative approaches to account for the various weighting differences in the two groups. In both linear regression and Poisson regression, this comes about in the form of an offset. I'm curious whether this should be a consideration in your problem.

In OLS, the mean is independent of the variance (under classical assumptions), so your fitted model will have the minimum squared residuals, which will be largely driven by large counts. In the Poisson GLM, large counts are significantly downweighted by inverse variance reweighting. An inspection of the distribution of the data using one or more scatterplots (depending on the number of adjustment variables) and fitted curves is a very important consideration indeed. You will certainly need to verify high leverage / high influence observations to validate the alternative modeling approaches you've proposed.

Using robust standard errors (one particular form of robust regression) does not assume that mean is independent of variance, but it does use such a working probability model, so while robust standard errors will be consistent, your point estimates will be unstable, and your inference will be of lower power (than when you can assume a better working probability model for the data).

Although R warns you about non-integral counts in Poisson GLMs, there are plenty of sane regression models, especially in, say, ecology, where non-integral Poisson outcomes come about such as plankton concentration in a cubic meter of sampled water from various watersheds, or flow cytometry assessed mRNA concentration in biopsied tumor tissue.