Solved – R: GLMM for unbalanced zero-inflated data (glmmTMB)

generalized linear modelglmmtmbrregressionzero inflation

Study design:

I have count data of snails per date, counted over many dates at sites, nested in localities.
So, in each locality the snail counts come from several different sites, repeatedly sampled on different dates.

Goal:

Test if snail counts differ between localities, and test influence of environmental factors (e.g. water pH)

Things to account for:

A: All in all, I have about 33% of the dates having counts of zero,
which makes me think the data is zero inflated. See histogram:

B: Sites in localities might show variation in intercepts due to higher initial snail abundance

C: Sampling duration differed (5-33 minutes), which will most likely influence counts

D: The number of sites per locality is unbalanced (one locality with only 1 site, overall range per locality: 1-9)

E: The total number of sampling dates per locality is unbalanced (overall range per locality: 19-35)

Steps so far:

A: Use glmmTMB to account for the zero inflation

B: Include site as a random intercept, to account for variation in counts between the sites

C: Include sampling duration as an offset, to account for differences in sampling effort

The current model:

model <- glmmTMB(snail_count ~ (1|site) + locality + pH + locality*pH + offset(log(duration)),
                               data = df,
                               ziformula = ~1,
                               family = poisson)

Questions:

Have I specified the model correctly?
How do I account for D: and E: (the unbalanced design)?

Best Answer

A1: "All in all, I have about 33% of the dates having counts of zero, which makes me think the data is zero inflated." -> this is a common misconception - zero-inflation != lots of zeros. Zero-inflation means you have more zeros than you would expect, given your fitted model. Without having fit a model, you can't know what to expect. The DHARMa R package (disclaimer: I develop this package) has a zero-inflation test for GLMMs, including glmmTMB, that you can use to test your model. However, see notes in the vignette about zero-inflation: when fitting GLMMs with variable dispersion, zero-inflation often shows up as underdispersion, so the most reliable test is usually to run a model selection with ZIP against standard model.

A2: When running a Poisson GLMM with count data, you absolutely have to check for overdispersion!!! Fitting a poisson without check is a big no no. It would be very uncommon that your data is not overdispersed, so your poisson is likely not appropriate and you should move to a neg binom or an poisson with OLRE. DHARMa has dispersion test that works with glmmTMB.

A3: at the end of the DHARMa vignette, there is an example for analysing and checking zero-inflated count data (Owl dataset)

B/C makes sense

D/E Not generally a problem, but especially in this case you should put an RE on locality as well (nested location/site)

Related Solutions

Random Effects – Interactions Between Random Effects in Mixed Models

Have you tried it? That sounds like it should be fine.

set.seed(101)
## generate fully crossed design:
d <- expand.grid(Year=2000:2010,Site=1:30)
## sample 70% of the site/year comb to induce lack of balance
d <- d[sample(1:nrow(d),size=round(0.7*nrow(d))),]
## now get Poisson-distributed number of obs per site/year
library(plyr)
d <- ddply(d,c("Site","Year"),transform,rep=seq(rpois(1,lambda=10)))
library(lme4)
d$ticks <- simulate(~1+(1|Year)+(1|Site)+(1|Year:Site),
                    family=poisson,newdata=d,
                    newparams=list(beta=2, ## mean(log(ticks))=2
                               theta=c(1,1,1)))[[1]]
mm <- glmer(ticks~1+(1|Year)+(1|Site)+(1|Year:Site),
                    family=poisson,data=d)

We get out approximately what we put in -- equal variances at each level:

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: ticks ~ 1 + (1 | Year) + (1 | Site) + (1 | Year:Site)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##  12487.3  12510.2  -6239.7  12479.3     2267 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.9944 -0.6842 -0.0726  0.6010  3.8532 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  Year:Site (Intercept) 1.0818   1.0401  
##  Site      (Intercept) 1.0490   1.0242  
##  Year      (Intercept) 0.9787   0.9893  
## Number of obs: 2271, groups:  Year:Site, 231 Site, 30 Year, 11
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.1952     0.3593   6.109    1e-09 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You may want to include an observation-level random effect to allow for overdispersion (see the "grouse ticks" example in http://rpubs.com/bbolker/glmmchapter)

Solved – Help with zero-inflated generalized linear mixed models with random factor in R

No, zeroinfl() currently does not support random effects. So the formula you specified actually means something different: You use a fixed treatment effect in the count part and a fixed site effect in the zero-inflation part. See vignette("countreg", package = "pscl") for more details.
If you want random effects, then no. If you use fixed interaction effects instead, you could still try to find a suitable model with zeroinfl(). But with your number of observations this is probably not the best solution.
As the model is not the one you would want to fit, this is not relevant here.
For zeroinfl() there would be and I suppose that for glmmADMB there are as well. But I'm not an expert on that.
You could employ effect plots for the covariate effects or rootograms for the goodness of fit. It depends on what you really want to show.