Solved – the appropriate model for underdispersed count data

beta-binomial distributionnegative-binomial-distributionpoisson distributionrunderdispersion

I am trying to model count data in R that is apparently underdispersed (Dispersion Parameter ~ .40). This is probably why a glm with family = poisson or a negative binomial (glm.nb) model are not significant. When I look at the descriptives of my data, I don't have the typical skew of count data and the residuals in my two experimental conditions are homogeneous, too.

So my questions are:

  1. Do I even have to use special regression analyses for my count data, if my count data doesn't really behave like count data? I face non-normality sometimes (usually due to the kurtosis), but I used the percentile bootstrap method for comparing trimmed means (Wilcox, 2012) to account for non-normality. Can methods for count data be substituted by any robust method suggested by Wilcox and realized in the WRS package?

  2. If I have to use regression analyses for count data, how do I account for the under-dispersion? The Poisson and the negative binomial distribution assume a higher dispersion, so that shouldn't be appropriate, right? I was thinking about applying the quasi-Poisson distribution, but that's usually recommended for over-dispersion. I read about beta-binomial models which seem to be able to account for over- as well as underdispersion are availabe in the VGAM package of R. The authors however seem to recommend a tilded Poisson distribution, but I can't find it in the package.

Can anyone recommend a procedure for underdispersed data and maybe provide some example R code for it?

Best Answer

The best --- and standard ways to handle underdispersed Poisson data is by using a generalized Poisson, or perhaps a hurdle model. Three parameter count models can also be used for underdispersed data; eg Faddy-Smith, Waring, Famoye, Conway-Maxwell and other generalized count models. The only drawback with these is interpretability. But for general underdispersed data the generalized Poisson should be used. It is like negative binomial for overdispersed data. I discuss this in some detail in two of my books, Modeling Count Data (2014) and Negative Binomial Regression, 2nd edition, (2011) both by Cambridge University Press. In R the VGAM package allows for generalized Poisson (GP) regression. Negative values of the dispersion parameter indicate adjustment for underdispersion. You can use the GP model for overdispersed data as well, but generally the NB model is better. When it comes down to it, its best to determine the cause for underdispersion and then select the most appropriate model to deal with it.