Solved – Is splitting one hurdle model in two GLM/GAM models a valid approach

count-datarzero inflation

I came across several publications dealing with overdispersed zero-inflated count data that "simply" modelled presence absence in one model and then postive counts in a second model. This led to two models with two different outcomes. The authors stated they were using "hurdle models".

In my opinion hurdle models do not work like that but "integrate" the results of both models in one step in one final output, is this right?

Is the other approach now "wrong" ??

Best Answer

I just randomly came across this question. Sorry it's so late. It is fine to model the two parts of a hurdle model separately. This paper addresses hurdle and zero-inflated models https://journal.r-project.org/archive/2017/RJ-2017-066/index.html

Using data from the glmmTMB package here's an example. You could fit the data with either a zero-inflated model (zinb below, where zeros can be from either the negative binomial or the zero-inflation) or with a hurdle model (hnb below). Then we can see that hnb is statistically equivalent to the combination of a binomial model for the zeros and a zero-truncated model for the positive counts.

> zinb = glmmTMB(count~spp * mined + (1|site), zi=~spp * mined, data=Salamanders, family=nbinom2)
> 
> hnb = glmmTMB(count~spp * mined + (1|site), zi=~spp * mined, data=Salamanders, family=truncated_nbinom2)
> 
> zeros = glmmTMB(count<.5 ~spp * mined, data=Salamanders, family=binomial)
> 
> pos = glmmTMB(count ~spp * mined + (1|site), data=subset(Salamanders, count>0), family= truncated_nbinom2)
> 
> logLik(pos)
'log Lik.' -491.5107 (df=16)
> 
> logLik(zeros)
'log Lik.' -315.2394 (df=14)
> logLik(pos)+logLik(zeros)
'log Lik.' -806.7501 (df=16)
> 
> logLik(hnb)
'log Lik.' -806.7501 (df=30)

You can also see the coefficients if you try it.

Related Solutions

Solved – Enormous SEs in zero-inflated negative binomial regression

(Hope this is an appropriate way to answer my own question; maybe this should be a comment?)

I figured out the problem thanks in part to a fantastic answer on count regression diagnostics. The offset was the culprit causing the huge SEs, and the residuals and other goodness-of-fit diagnostics were accordingly strange. The offset term may have been a problem because one of the countries (China) had a huge population but comparatively few cases.

For comparison, here is the ZINB without the offset:

enter image description here

Hope this helps anyone else experiencing similar problems.

Solved – Trouble finding good model fit for count data with mixed effects – ZINB or something else

This post has four years, but I wanted to follow on what fsociety said in a comment. Diagnosis of residuals in GLMMs is not straightforward, since standard residual plots can show non-normality, heteroscedasticity, etc., even if the model is correctly specified. There is an R package, DHARMa, specifically suited for diagnosing these type of models.

The package is based on a simulation approach to generate scaled residuals from fitted generalized linear mixed models and generates different easily interpretable diagnostic plots. Here is a small example with the data from the original post and the first fitted model (m1):

library(DHARMa)
sim_residuals <- simulateResiduals(m1, 1000)
plotSimulatedResiduals(sim_residuals)

The plot on the left shows a QQ plot of the scaled residuals to detect deviations from the expected distribution, and the plot on the right represents residuals vs predicted values while performing quantile regression to detect deviations from uniformity (red lines should be horizontal and at 0.25, 0.50 and 0.75).

Additionally, the package has also specific functions for testing for over/under dispersion and zero inflation, among others:

testOverdispersionParametric(m1)

Chisq test for overdispersion in GLMMs

data:  poisson
dispersion = 0.18926, pearSS = 11.35600, rdf = 60.00000, p-value = 1
alternative hypothesis: true dispersion greater 1

testZeroInflation(sim_residuals)

DHARMa zero-inflation test via comparison to expected zeros with 
simulation under H0 = fitted model


data:  sim_residuals
ratioObsExp = 0.98894, p-value = 0.502
alternative hypothesis: more

Best Answer

Related Solutions

Solved – Enormous SEs in zero-inflated negative binomial regression

Solved – Trouble finding good model fit for count data with mixed effects – ZINB or something else

Related Question