Solved – How to assess overdispersion in Poisson GLMM, lmer( )

glmmoverdispersionrunderdispersion

I have a GLMM with Poisson distribution and random spatial block. My experimental design is 2×2 factorial, with 4 blocks, resulting in 16 total data points. Here is the specification of the model in R using the lme4 package.

lmer(rich ~ morph*caged + (1|block), 
      family=poisson, data=bexData)

When I call summary on this object, I am returned

   AIC   BIC logLik deviance
 18.58 22.44 -4.288    8.576
Random effects:
 Groups Name        Variance Std.Dev.
 block  (Intercept)  0        0      
Number of obs: 16, groups: block, 4

I have left out the fixed effect parameter tests and correlations for brevity.

Here are my primary questions:

  1. Can you use this output to calculate overdispersion?

    • I have read that overdispersion can be calculated as the residual deviance divided by the residual degrees of freedom. Is that 8.576 / (16 – 4)? (Zuur et al., Mixed Effects Models)
  2. If this calculation is correct, the estimator phi = 0.715. This indicates that there is not overdispersion in my data.

    • Does this indicate that there is underdispersion?
    • Is this a problem?
    • Can anybody offer advice as to thresholds for over/underdispersion at which corrections to the models should be made? Zuur has said in one book that 5 is a common cutoff. Do people agree with that?
    • How can such corrections be made?
  3. I've also noticed here that the variance for the random effect is 0.

    • Does this mean that there are precisely no error correlations between data points within my blocking factor?
    • If this is so, why would a generalised linear model of the form shown at bottom have an AIC substantially higher, around 55?
    • is AIC a reasonable method for choosing GLMM over GLM (as suggested by Zuur)?

.

glm(rich ~ morph*caged, data=bexData, 
        family=poisson)

Best Answer

My general preference, when comparing a more complex model (here, NB) to a less complex one (here Poisson) is not to rely on any statistical test, but to run both and see if the predicted values are substantially different. (And what 'substantially' means is dependent on the field you are working in). If they are, then prefer the more complex model. If not, the simpler.

This allows us not to rely on arbitrary cutoffs; it requires us to employ judgement. Those are, in my opinion, good things.