I'm analysing count data with a generalised linear model in R. I started with a Poisson family distribution, but then realized that data was clearly overdispersed. I then took the option of applying a glm with negative binomial distribution (I'm using the function glm.nb()
from MASS package). Interestingly, I get the same best-selected model with a forward and a backward stepwise selection approach, which is:
m.step2 <- glm.nb(round(N.FLOWERS) ~ Hs_obs+RELATEDNESS+CLONALITY+PRODUCTION, data = flower[c(-12, -17), ])
Then to test for fixed effects I use the anova() function, which gives:
anova(m.step2, test = "Chi")
Analysis of Deviance Table
Model: Negative Binomial(1.143), link: log
Response: round(N.FLOWERS)
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>F)
NULL 15 40.674
Hs_obs 1 9.5978 14 31.076 0.001948 **
RELATEDNESS 1 9.4956 13 21.581 0.002060 **
CLONALITY 1 3.0411 12 18.540 0.081181 .
PRODUCTION 1 3.7857 11 14.754 0.051693 .
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Warning messages:
1: In anova.negbin(m.step2, test = "F") : tests made without re-estimating 'theta'
However, if there were overdispersion (even with the negative binomial) these p-values should be corrected, shouldn't they? In my case, the residual deviance (obtained from the summary(m.step2)
) is 14.754 and residual degrees of freedom 11. Thus, overdispersion is 14.754/11 = 1.34.
How do I correct the p-values to account for the small amount of overdispersion detected in this negative binomial model?
Best Answer
I'm not sure how to correct the p-values. However you can typically examine the mean-variance assumption in a negative binomial regression by looking at the residuals versus fitted values plot.
If this plot of residuals versus fitted values is not (roughly) an amorphous, random cloud of data points, then you can try using quasi-Poisson regression. Another alternative is to construct your own mean-variance relationship using quasi-likelihood.
Hope this helps!