Solved – Can you use glmmTMB to simultaneously model offsets and zero-inflation

glmmglmmtmbmixed modeloverdispersionzero inflation

I'm currently modelling microbial data, with multiple samples and groups of samples. Two problems arise with my data: 1) The data is zero-inflated and dispersed (large variation); 2) Each sample has a different number of counts, leading to biased results based on sample-count size. To account for these problems, I tried modelling my data using a ZINB model with the glmmTMB package, using the 'Offset' argument to model total sample count and by specifying a 'ziformula' based on the structure of my data. However, while reading the manual it specifies that "Offset terms will automatically be dropped from the conditional effects formula when using ~".

Is there any way to account for zero-inflation, over-dispersion, and unequal sample-depth using a ZINB within glmmTMB, or should I look into other methods? My model currently has two random effects and 1 fixed effect.

Best Answer

tl;dr as far as I can tell at this point,

 glmmTMB(formula=<...>+offset(log(sampling_effort)),
         ziformula = ~.,
         family=nbinom2,
         data=<...>)

should do what you want. (1) ziformula specifies zero-inflation. (2) The offset term in the conditional model (formula) adjusts for sampling effort; as I understand your context, you shouldn't need any adjustment for effort/sample depth in the zero-inflation part of the model (since it describes structural zeros, which will always be observed as zero regardless of sampling effort). (3) family=nbinom2 takes care of other sources of overdispersion. [You might want to alternately consider nbinom1, which specifies $\textrm{Var} \propto \mu$ rather than $\textrm{Var} = \mu + \mu^2/k$.]


This is an interesting question both from the statistical and the implementation point of view.

Implementation: you can add offsets to zero-inflation terms, you just can't do it via .. For example something like

 glmmTMB(y~x,
         family=nbinom2,
         zi=~x+offset(log(w)))

should work fine. It's only if you try to use zi = ~ . to match the conditional formula in a lazy way that the offset gets dropped.

Statistics: I question a couple of your premises.

  • First of all, it's not immediately obvious that different numbers of counts should lead to biased results; it's important to know what the source of the differences is - i.e. large natural variations in density, or variations in searching/capture effort?
  • Second, you have to think carefully about the form of the offset. Using log(effort) as the offset with a log-link zero-inflation term (the only choice at present) will mean that the probability of an observation being a structural zero will be proportional to the effort, i.e. $\log(p_z) = ... + \log(e) \to p_z \propto e$. In general a complementary log-log link (with log(effort) as the offset) is more appropriate for detection probabilities, as that makes the hazard of finding something proportional to effort. However, if you're really trying to model structural zeros I question whether search effort should influence this part of the model at all ...

Based on the comments, I think this question may be based on a (reasonable) misunderstanding of ?glmmTMB, which currently reads

ziformula: ... Offset terms will automatically be dropped from the conditional effects formula when using ‘~.’

This warning applies only to the zero-inflation formula: the conditional formula (formula) argument isn't modified at all, it's only the zero-inflation version of the formula (and, as discussed above, you can include an offset in the zero-inflation part of the model if you really want to, by writing out the formula explicitly). Thus if you follow the fairly standard procedure of adding + offset(log(sampling_effort)) to formula then the conditional mean number of counts will be proportional to effort (assuming a [default] log-link model for the counts).

In hopes of clarifying this I've tried editing this statement: is the following clearer?

Specifying ~. will set the right-hand side of the zero-inflation formula identical to the right-hand side of the main (conditional effects) formula; terms can also be added or subtracted. Offset terms will automatically be dropped from the conditional effects formula when using ~.

(Hmmm, now that I read this over it doesn't seem much better ...) Feel free to comment (or suggest clearer wording) here, or at https://github.com/glmmTMB/glmmTMB/issues ...

Related Question