Solved – Very large theta values using glm.nb in R – alternative approaches

generalized linear modeloverdispersionr

While analysing the effect of environmental data on the activity of an animal species (the latter given as count data) I am fitting negative binomial GLMs with one predictor using the MASS library in R. Unfortunatley, the data set is very small (n=7 to 9).

In some cases, the theta value in glm.nb gets very large (accompanied with the warning "iteration limit reached"), possibly indicating that there's no overdispersion and a poisson GLM might be a better choice. Using a poisson GLM, however, a residual deviance of e.g. 150 on 7 degrees of freedom indicates that there actually is overdispersion – or did I miss something?

Using a quasi-poisson GLM works, but I would like to retain ML-based measures such as AIC and Vuong test for model comparison. Any suggestions for alternative approaches are greatly appreciated!

Best Answer

It doesn't necessarily mean that there is overdispersion (though it could), just that a saturated model may be a better fit. If you only have 7-9 observations, it will be very difficult to accurately test for overdispersion unless you have some values that are just way out there under a Poisson assumption.

Another option you might look into is using the Poisson model but using a transformed value of your predictor rather than a linear fit on the raw variable. If it looks like the larger values of the predictor are where the Y-values are off more, you could try using something like a squared value of the predictor, or if it's the opposite then maybe a log-transform of the predictor.

Thinking about overdispersion in a count model is always a good idea, but it does introduce complexity into the model. With so few data points, your best approach might be to keep it as simple as possible.