Solved – Overdispersion in Model selection procedures (AIC)

aicmodel selectionoverdispersion

My question is pretty straightforward.

Does overdispersion mean anything when doing model selection and multi model inferences?

I understand that overdispersion affects the estimation of standard errors, and in consequence, CI and p-values. But if I am bypassing classical inference, and performing a model selection based on AIC, do I need to worry about overdispersion?

For example, if I am modeling a count variable using a Poisson GLM, I can measure and discuss how much dispersion in the data my model is not considering, by calculating residual variability and comparing with the fixed scale parameter. Of course, if I use a negative binomial model I will end up with a much more flexible model. But does overdispersion means anything in Model selection inference?

Best Answer

I've been interested in this as well. This may not be a full answer, but as far as I understand it, it does appear to affect AIC.

The main refrence I've got for this is the following paper: Anderson, D. R., Burnham, K. P. and White, G. C. (1994), AIC Model Selection in Overdispersed Capture-Recapture Data. http://onlinelibrary.wiley.com/doi/10.2307/1939637/full

In summary, it suggest that AIC and AICc are poor at selecting the 'true' model, judged by a RSS measure, in the presence of over-dispersion. They generally select over-fitted models. The authors examine a couple of adjustments for it, directly to AIC and a 'dimension-consistent criterion (CAIC)', the later of which appears to under-fit the data when overdispersion is present.

They suggest that applying Quasi-likelihood theory allows an adjustment to AIC, AICC and CAIC, as follows: $$QAIC = - {2 log(\cal{L} ( \theta)/ \hat c)} + 2K$$ $$QAIC_c = QAIC + \frac{2(K + 1)(K + 2)}{n - K - 2}$$ $$QCAIC = - {2 log(\cal{L} ( \theta)/ \hat c)} + K[log(n) +1]$$

So it appears that the scale parameter from a quasi-likelihood model fit could be used to scale the AIC, giving better results with regards to under and over fitting.

If you want to use QAIC, I assume there are plenty of references online, but if you are an R user, this document by Ben Bolker in the 'bbmle' package is worth a read.