Solved – How to account for overdispersion in a glm with negative binomial distribution

generalized linear modeloverdispersionr

I'm analysing count data with a generalised linear model in R. I started with a Poisson family distribution, but then realized that data was clearly overdispersed. I then took the option of applying a glm with negative binomial distribution (I'm using the function glm.nb() from MASS package). Interestingly, I get the same best-selected model with a forward and a backward stepwise selection approach, which is:

m.step2 <- glm.nb(round(N.FLOWERS) ~ Hs_obs+RELATEDNESS+CLONALITY+PRODUCTION, data = flower[c(-12, -17), ])

Then to test for fixed effects I use the anova() function, which gives:

anova(m.step2, test = "Chi")
Analysis of Deviance Table
Model: Negative Binomial(1.143), link: log
Response: round(N.FLOWERS)
Terms added sequentially (first to last)
              Df Deviance Resid. Df  Resid. Dev   Pr(>F)   

 NULL                           15     40.674                   
 Hs_obs       1   9.5978        14     31.076    0.001948 **
 RELATEDNESS  1   9.4956        13     21.581    0.002060 **
 CLONALITY    1   3.0411        12     18.540    0.081181 . 
 PRODUCTION   1   3.7857        11     14.754    0.051693 .
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 Warning messages: 
 1: In anova.negbin(m.step2, test = "F") : tests made without re-estimating 'theta'

However, if there were overdispersion (even with the negative binomial) these p-values should be corrected, shouldn't they? In my case, the residual deviance (obtained from the summary(m.step2)) is 14.754 and residual degrees of freedom 11. Thus, overdispersion is 14.754/11 = 1.34.

How do I correct the p-values to account for the small amount of overdispersion detected in this negative binomial model?

Best Answer

I'm not sure how to correct the p-values. However you can typically examine the mean-variance assumption in a negative binomial regression by looking at the residuals versus fitted values plot.

If this plot of residuals versus fitted values is not (roughly) an amorphous, random cloud of data points, then you can try using quasi-Poisson regression. Another alternative is to construct your own mean-variance relationship using quasi-likelihood.

Hope this helps!