Solved – Resolving heteroscedasticity in Poisson GLMM

glmmheteroscedasticitypoisson distributionr

I have long-term collection data, and I'd like to test, whether the number of animals collected is influenced by weather effects. My model looks like below:

glmer(SumOfCatch ~ I(pc.act.1^2) +I(pc.act.2^2) + I(pc.may.1^2) + I(pc.may.2^2) + 
                   SampSize + as.factor(samp.prog) + (1|year/month), 
      control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=1e9,npt=5)), 
      family="poisson", data=a2)

Explanation of the used variables:

  • SumOfCatch: number of animals collected
  • pc.act.1, pc.act.2: axes of a principal component representing weather conditions during sampling
  • pc.may.1, pc.may.2: axes of a PC representing weather conditions in May
  • SampSize: number of pitfall traps, or collecting transects of standard lengths
  • samp.prog: method of sampling
  • year: year of sampling (from 1993 to 2002)
  • month: month of sampling (from Aug to Nov)

The fitted model's residuals show considerable inhomogeneity (heteroscedasticity?) when plotted against fitted values (see Fig.1):

Residuals vs. Fitted values - Main model

My main question is: is this a problem making the reliability of my model questionable? If so, what can I do to resolve it?

So far I have tried the followings:

  • control for overdispersion by defining observation-level random effects, i.e. using a unique ID for each observation, and applying this ID variable as random effect; although my data do show considerable overdispersion, this did not help as the residuals became even more ugly (see Fig. 2)

Residuals vs. Fitted values - Model with OD control

  • I fitted models without random effects, with quasi-Poisson glm and glm.nb; also yielded similar residual vs. fitted plots to the original model

As far as I know, there might be ways for the estimation of heteroscedasticity-consistent standard errors, but I have failed to find any such method for Poisson (or any other kind of) GLMMs in R.


In response to @FlorianHartig: the number of observations in my dataset is N=554, I think this is a fair obs. number for such a model, but of course, the more the merrier. I post two figures, first of which is the DHARMa scaled residual plot (suggested by Florian) of the main model.

enter image description here

The second figure is from a second model, in which the only difference is that it contains the observation-level random effect (the first does not).

enter image description here

UPDATE

Figure of the relationship between a weather-variable (as predictor, i.e. x-axis) and sampling success (response):

Weather-PC and sampling success

UPDATE II.

Figures showing predictor values vs. residuals:

Predictors vs. Residuals

Best Answer

It is difficult to assess the fit of the Poisson (or any other integer-valued GLM for that matter) with Pearson or deviance residuals, because also a perfectly fitting Poisson GLMM will exhibit inhomogeneous deviance residuals.

This is especially so if you do GLMMs with observation-level REs, because the dispersion created by OL-REs is not considered by the Pearson residuals.

To demonstrate the issue, the following code creates overdispersed Poisson data, that is then fitted with a perfect model. The Pearson residuals look very much like your plot - hence, it may be that there is no problem at all.

This problem is solved by the DHARMa R package, which simulates from the fitted model to transform the residuals of any GL(M)M into a standardized space. Once this is done, you can visually assess / test residual problems, such as deviations from the distribution, residual dependency on a predictor, heteroskedasticity or autocorrelation in the normal way. See the package vignette for worked-through examples. You can see in the lower plot that the same model now looks fine, as it should.

If you still see heteroscedasticity after plotting with DHARMa, you will have to model dispersion as a function of something, which is not a big problem, but would likely require you to move to JAGs or another Bayesian software.

library(DHARMa)
library(lme4)

testData = createData(sampleSize = 200, overdispersion = 1, randomEffectVariance = 1, family = poisson())

fittedModel <- glmer(observedResponse ~ Environment1 + (1|group) + (1|ID), family = "poisson", data = testData, control=glmerControl(optCtrl=list(maxfun=20000) ))

# standard Pearson residuals
plot(fittedModel, resid(., type = "pearson") ~ fitted(.) , abline = 0)

# DHARMa residuals
plot(simulateResiduals(fittedModel))

enter image description here

enter image description here

Related Question