Solved – Interpretation of intercept term in poisson model with offset and covariates

offsetpoisson distributionr

I'm using offset for the first time (as per a recommendation from a colleague) and have a couple questions about interpreting my results. Our ultimate goal is to look at the effect of some population level treatment on disease incidence (cases/population). We've decided to use poisson models, but there are surely a variety of ways to look at our data. My data look like this:

cases <- c(6216128, 3341110,  855105,  359371,  417393,  640434,  528914,  377166,  401556,  252832,  128458)
population <- c(54703334, 54252430, 55976643, 56630708, 57373529, 58025577, 58617708, 58921850, 59695818, 60466585, 60223458)
treat.count <- c(13389482, 17746954, 27974966, 27329972, 16534356, 10591797, 12740820, 11787687,  6780603,  5503181,  4446687) 
treat.percent <- c(0.24476537, 0.32711814, 0.49976141, 0.48259986, 0.28818789, 0.18253669, 0.21735446, 0.20005629, 0.11358590, 0.09101194, 0.07383646)
data <- cbind(cases, population, treat.count, treat.percent)
mydata <- as.data.frame(data)

I have two overarching questions:

  1. the interpretation of offset in these poisson models and
  2. the interpretation of the poisson model with offset and covariates added.

1) with the inclusion of offset and no covariates:

f1 <- glm(cases ~ offset(population), data=mydata, family=poisson)

is that the expected value of cases, divided by pop, is exp(intercept)…correct?

2) with the inclusion of offset and covariates:

f2 <- glm(cases ~ offset(population)+log(treat.percent), data=mydata, family=poisson)

is that the expected value of cases, divided by pop, is exp(intercept)…as the treat.percent increases?

There were similar questions posted before, but not quite this situation.

Best Answer

I think that you want offset(log(population)) in your models above.

The offset is just a term included in the model without estimating a coefficient for it (fixing the coefficient at 1). Since the standard transformation in poisson regression is log, you can think of incuding the offset of log(population) as a rough equivalent (though mathematically better) of using log( cases/population ) as the response variable. So it is adjusting for differences in population sizes. This means that the intercept without any offset is predicting the average when log(population) is 0, or in other words, when you have a population of 1. The slope in the second model would then be the increase for a population of size 1. You could also use an offset like offset(log(population/1000)) and then the interpretations would be for a population of size 1,000 (change the 1,000 to whatever value is meaningful for you), this makes it easier to visualize.

For most models beyond the simplest it is often easier to interpret predictions from the model rather than individual coeficients. The Predict.Plot and TkPredict functions in the TeachingDemos package may help.

Related Question