I think that you want offset(log(population))
in your models above.
The offset is just a term included in the model without estimating a coefficient for it (fixing the coefficient at 1). Since the standard transformation in poisson regression is log, you can think of incuding the offset of log(population) as a rough equivalent (though mathematically better) of using log( cases/population ) as the response variable. So it is adjusting for differences in population sizes. This means that the intercept without any offset is predicting the average when log(population) is 0, or in other words, when you have a population of 1. The slope in the second model would then be the increase for a population of size 1. You could also use an offset like offset(log(population/1000))
and then the interpretations would be for a population of size 1,000 (change the 1,000 to whatever value is meaningful for you), this makes it easier to visualize.
For most models beyond the simplest it is often easier to interpret predictions from the model rather than individual coeficients. The Predict.Plot and TkPredict functions in the TeachingDemos package may help.
1) What is the best way of determining when to use neg. binom. vs. poisson?
A common way (not necessarily the best --- what's 'best' depends on your criteria for bestness) to decide this would be to see if there's overdispersion in a Poisson model (e.g. by looking at the residual deviance.
For example, look at summary(glm(count~spray,InsectSprays,family=poisson))
- this has a residual deviance of 98.33 for 66 df. That's about 50% larger than we'd expect, so it's probably big enough that it could matter for your inference.
[If you want a formal test, pchisq(98.33,66,lower.tail=FALSE)
, but formal testing of assumptions is generally answering the wrong question.]
So I'd be inclined to consider a negative binomial for that case.
More generally, if you're not reasonably confident that the Poisson makes sense, you could simply use negative binomial as a default, since it encompasses the Poisson as a limiting case.
2) Is this an appropriate instance to include sampling time in an offset term?
Yes, that's appropriate, and it would be my first instinct to include sampling time as an offset (rather than a predictor), since count would be expected to simply be proportional to the length of the sampling interval.
Best Answer
Let's take a quick look at Wikipedia:
Exposure is a measure on how you want to divide your counts to. Do you want to divide by unit area? volume size? It has nothing to do with Poisson regression. It's something you want to do with your data.
Offset is a modelling technique in Poisson regression. If you don't want to use Poisson regression, you won't have an offset in your model. It's a simple trick in Poisson regression that allows you model for rates without a new statistical framework.
We use offset with the Poisson regression model to adjust for counts of events over time periods, areas and volumes. Details on what exactly offset is mathematically, goto:
When to use an offset in a Poisson regression?
Note how the offset goes to the right side of the equation. The offset is the log of exposure (because we're using the log link).