Solved – Overdispersion in poisson glm

generalized linear modeloverdispersionpoisson distribution

When calculating the dispersion deviance/degrees freedom I get the value 1.8. Is it absolutly necessary to carry out the glm using quasipoisson? What is deemed 'significantly overdispersed' ?

Best Answer

The Poisson model assumes equal mean and variance. If that doesn't hold, then the Poisson model isn't correct. Quasi-poisson is one possibility when there is overdispersion. Others include: Negative binomial regression (NBR) - similar to Poisson model, but using the negative binomial distribution instead, which has a dispersion parameter. Available in the MASS package in R, also integrated into Stata. Hurdle regression - for circumstances with more 0s than would be expected from the Poisson/NB model. It combines a logit/probit with Poisson/NB, where the logit/probit is used to estimate y=0 vs y>0, and a truncated Poisson/NB is used to estimate the cases where y>0. Available in the pscl package. Also available as a separate Stata add-on I cannot remember. Zero-inflated - zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZNB) models are similar to the hurdle model, but assume two mechanisms at work to generate 0s: never-takers, and potential takers who didn't take in this instance. Available in the pscl package, and available in Stata.

If your data are over-dispersed, try NBR, and compare the log-likelihoods (e.g. via AIC/BIC, etc.), and you can also get the statistical significance of the dispersion parameter from the NBR. From what I can tell, there is no disadvantage of using the NBR model relative to the Poisson model - so when I have overdispersion (in my limited experience, this has always been the case!), I simply use NBR and don't think twice. I could be wrong and would welcome others' thoughts. One of the downsides to the quasi-poisson is that it doesn't allow you to get likelihood-based stats, like the AIC/BIC. NBR uses MLE so it does.

This is a wonderful reference walking through how these models are used; it's an R vignette, but even if you don't use R it should be very useful.

Related Solutions

Poisson vs Quasi-Poisson Regression – Handling Overdispersion in Count Data

When trying to determine what sort of glm equation you want to estimate, you should think about plausible relationships between the expected value of your target variable given the right hand side (rhs) variables and the variance of the target variable given the rhs variables. Plots of the residuals vs. the fitted values from from your Normal model can help with this. With Poisson regression, the assumed relationship is that the variance equals the expected value; rather restrictive, I think you'll agree. With a "standard" linear regression, the assumption is that the variance is constant regardless of the expected value. For a quasi-poisson regression, the variance is assumed to be a linear function of the mean; for negative binomial regression, a quadratic function.

However, you aren't restricted to these relationships. The specification of a "family" (other than "quasi") determines the mean-variance relationship. I don't have The R Book, but I imagine it has a table that shows the family functions and corresponding mean-variance relationships. For the "quasi" family you can specify any of several mean-variance relationships, and you can even write your own; see the R documentation. It may be that you can find a much better fit by specifying a non-default value for the mean-variance function in a "quasi" model.

You also should pay attention to the range of the target variable; in your case it's nonnegative count data. If you have a substantial fraction of low values - 0, 1, 2 - the continuous distributions probably won't fit well, but if you don't, there's not much value in using a discrete distribution. It's rare that you'd consider Poisson and Normal distributions as competitors.

Solved – How to get dispersion parameter from a binomial mixed model

You may be able to use this function attributed to D. Bates to get the scale parameter:

dispersion_glmer<- function(modelglmer)
{   
# computing  estimated scale  ( binomial model)
#following  D. Bates :
#That quantity is the square root of the penalized residual sum of
#squares divided by n, the number of observations, evaluated as:

    n <- length(modelglmer@resid)

    return(  sqrt( sum(c(modelglmer@resid, modelglmer@u) ^2) / n ) )
}

This is a link to more information on the scale parameter.

Best Answer

Related Solutions

Poisson vs Quasi-Poisson Regression – Handling Overdispersion in Count Data

Solved – How to get dispersion parameter from a binomial mixed model

Related Question