Solved – Is splitting one hurdle model in two GLM/GAM models a valid approach

count-datarzero inflation

I came across several publications dealing with overdispersed zero-inflated count data that "simply" modelled presence absence in one model and then postive counts in a second model. This led to two models with two different outcomes. The authors stated they were using "hurdle models".

In my opinion hurdle models do not work like that but "integrate" the results of both models in one step in one final output, is this right?

Is the other approach now "wrong" ??

Best Answer

I just randomly came across this question. Sorry it's so late. It is fine to model the two parts of a hurdle model separately. This paper addresses hurdle and zero-inflated models https://journal.r-project.org/archive/2017/RJ-2017-066/index.html

Using data from the glmmTMB package here's an example. You could fit the data with either a zero-inflated model (zinb below, where zeros can be from either the negative binomial or the zero-inflation) or with a hurdle model (hnb below). Then we can see that hnb is statistically equivalent to the combination of a binomial model for the zeros and a zero-truncated model for the positive counts.

> zinb = glmmTMB(count~spp * mined + (1|site), zi=~spp * mined, data=Salamanders, family=nbinom2)
> 
> hnb = glmmTMB(count~spp * mined + (1|site), zi=~spp * mined, data=Salamanders, family=truncated_nbinom2)
> 
> zeros = glmmTMB(count<.5 ~spp * mined, data=Salamanders, family=binomial)
> 
> pos = glmmTMB(count ~spp * mined + (1|site), data=subset(Salamanders, count>0), family= truncated_nbinom2)
> 
> logLik(pos)
'log Lik.' -491.5107 (df=16)
> 
> logLik(zeros)
'log Lik.' -315.2394 (df=14)
> logLik(pos)+logLik(zeros)
'log Lik.' -806.7501 (df=16)
> 
> logLik(hnb)
'log Lik.' -806.7501 (df=30)

You can also see the coefficients if you try it.

Related Question