Solved – Overdispersion in poisson glm

generalized linear modeloverdispersionpoisson distribution

When calculating the dispersion deviance/degrees freedom I get the value 1.8. Is it absolutly necessary to carry out the glm using quasipoisson? What is deemed 'significantly overdispersed' ?

Best Answer

The Poisson model assumes equal mean and variance. If that doesn't hold, then the Poisson model isn't correct. Quasi-poisson is one possibility when there is overdispersion. Others include: Negative binomial regression (NBR) - similar to Poisson model, but using the negative binomial distribution instead, which has a dispersion parameter. Available in the MASS package in R, also integrated into Stata. Hurdle regression - for circumstances with more 0s than would be expected from the Poisson/NB model. It combines a logit/probit with Poisson/NB, where the logit/probit is used to estimate y=0 vs y>0, and a truncated Poisson/NB is used to estimate the cases where y>0. Available in the pscl package. Also available as a separate Stata add-on I cannot remember. Zero-inflated - zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZNB) models are similar to the hurdle model, but assume two mechanisms at work to generate 0s: never-takers, and potential takers who didn't take in this instance. Available in the pscl package, and available in Stata.

If your data are over-dispersed, try NBR, and compare the log-likelihoods (e.g. via AIC/BIC, etc.), and you can also get the statistical significance of the dispersion parameter from the NBR. From what I can tell, there is no disadvantage of using the NBR model relative to the Poisson model - so when I have overdispersion (in my limited experience, this has always been the case!), I simply use NBR and don't think twice. I could be wrong and would welcome others' thoughts. One of the downsides to the quasi-poisson is that it doesn't allow you to get likelihood-based stats, like the AIC/BIC. NBR uses MLE so it does.

This is a wonderful reference walking through how these models are used; it's an R vignette, but even if you don't use R it should be very useful.