Solved – R: GLMM for unbalanced zero-inflated data (glmmTMB)

generalized linear modelglmmtmbrregressionzero inflation

Study design:

I have count data of snails per date, counted over many dates at sites, nested in localities.
So, in each locality the snail counts come from several different sites, repeatedly sampled on different dates.

Goal:

Test if snail counts differ between localities, and test influence of environmental factors (e.g. water pH)


Things to account for:

A: All in all, I have about 33% of the dates having counts of zero,
which makes me think the data is zero inflated. See histogram:
enter image description here
B: Sites in localities might show variation in intercepts due to higher initial snail abundance

C: Sampling duration differed (5-33 minutes), which will most likely influence counts

D: The number of sites per locality is unbalanced (one locality with only 1 site, overall range per locality: 1-9)

E: The total number of sampling dates per locality is unbalanced (overall range per locality: 19-35)


Steps so far:

A: Use glmmTMB to account for the zero inflation

B: Include site as a random intercept, to account for variation in counts between the sites

C: Include sampling duration as an offset, to account for differences in sampling effort

The current model:

model <- glmmTMB(snail_count ~ (1|site) + locality + pH + locality*pH + offset(log(duration)),
                               data = df,
                               ziformula = ~1,
                               family = poisson)

Questions:

  1. Have I specified the model correctly?
  2. How do I account for D: and E: (the unbalanced design)?

Best Answer

A1: "All in all, I have about 33% of the dates having counts of zero, which makes me think the data is zero inflated." -> this is a common misconception - zero-inflation != lots of zeros. Zero-inflation means you have more zeros than you would expect, given your fitted model. Without having fit a model, you can't know what to expect. The DHARMa R package (disclaimer: I develop this package) has a zero-inflation test for GLMMs, including glmmTMB, that you can use to test your model. However, see notes in the vignette about zero-inflation: when fitting GLMMs with variable dispersion, zero-inflation often shows up as underdispersion, so the most reliable test is usually to run a model selection with ZIP against standard model.

A2: When running a Poisson GLMM with count data, you absolutely have to check for overdispersion!!! Fitting a poisson without check is a big no no. It would be very uncommon that your data is not overdispersed, so your poisson is likely not appropriate and you should move to a neg binom or an poisson with OLRE. DHARMa has dispersion test that works with glmmTMB.

A3: at the end of the DHARMa vignette, there is an example for analysing and checking zero-inflated count data (Owl dataset)

B/C makes sense

D/E Not generally a problem, but especially in this case you should put an RE on locality as well (nested location/site)