Solved – When to use offset() in negative binomial/poisson GLMs in R

generalized linear modelmodelingoffsetr

I'm trying to detect relationships between species abundances (counts) and time (years) for many species using either Negative Binomial or Poisson regressions (depending on degree of dispersion). Sampling time (minutes) is not the same for all collections so my questions are:

1) What is the best way of determining when to use negative binomial vs. Poisson?

2) Is this an appropriate instance to include sampling time in an offset term? In most cases, sampling occurs for 10 minutes, but it is sometimes 15 or 20 minutes.

Any suggestions or advice would be appreciated.

Best Answer

1) What is the best way of determining when to use neg. binom. vs. poisson?

A common way (not necessarily the best --- what's 'best' depends on your criteria for bestness) to decide this would be to see if there's overdispersion in a Poisson model (e.g. by looking at the residual deviance.

For example, look at summary(glm(count~spray,InsectSprays,family=poisson)) - this has a residual deviance of 98.33 for 66 df. That's about 50% larger than we'd expect, so it's probably big enough that it could matter for your inference.

[If you want a formal test, pchisq(98.33,66,lower.tail=FALSE), but formal testing of assumptions is generally answering the wrong question.]

So I'd be inclined to consider a negative binomial for that case.

More generally, if you're not reasonably confident that the Poisson makes sense, you could simply use negative binomial as a default, since it encompasses the Poisson as a limiting case.

2) Is this an appropriate instance to include sampling time in an offset term?

Yes, that's appropriate, and it would be my first instinct to include sampling time as an offset (rather than a predictor), since count would be expected to simply be proportional to the length of the sampling interval.

Related Question