Solved – R: hurdle models with date as non-nested random effect

glmmrrandom variable

I have litte experience with GLMM's and I need to use Hurdle models for the first time, but I'm really confused about the random effect part.

I have a dataframe with counts of flies caught in traps with 2 different luring products (same nr of traps for each product). The traps were emptied every 3-4 days for a few months. The sex and morph of the flies was determined. The df looks like this:

Date       value  morph sex product
2016-04-05     5 Winter   M     ACV
2016-04-05     1 Summer   M     ACV
2016-04-05    18 Winter   F     ACV
2016-04-05     3 Summer   F     ACV
2016-04-05     0 Winter   M     FRA
2016-04-05     0 Summer   M     FRA
2016-04-05     0 Winter   F     FRA
2016-04-05     0 Summer   F     FRA
2016-04-08     0 Winter   M     ACV
2016-04-08     0 Summer   M     ACV
...

I need to add date as random effect so I use GLMM. Because I have a lot of zeros, a normal GLMM doesn't work. I read about hurdle models and I think these are fitting for the data. Better than zero-inflated, because I can't have 0's after "taking the hurdle". (based on this post: What is the difference between zero-inflated and hurdle distributions (models)? )

I've come up with this so far:

# binairy part
hurP1 <- glmmadmb(value ~ product * sex * morph + (1 | Date), 
                  data = data2, family = "binomial")

# truncated at 0 part
hurP2 <- glmmadmb(value ~ product * sex * morph + (1 | Date),
                  data = subset(data2, value > 0), family = "truncnbinom1")

In the example of the glmmADMB package,
http://glmmadmb.r-forge.r-project.org/glmmADMB.html
they use formatted response data (nz) in the binairy part of the model and I don't understand why. They only take Y > 0, but this model is checking the 0's? Why remove them?

EDIT Niek answered my question about the response data. But the random part of my model is still not correct, I get this error:

Error in Droplevels(eval(parse(text = x), data)) : 
  all grouping variables in random effects must be factors

So this question is still standing:

Also, I'm not sure if my random effect part is correct like this? I don't have nesting and I see nesting or blocking factors in every example I encounter.

EDIT2 Forgot to run the script lines that turn Date into a vector with factors… All problems solved now!

Best Answer

This is my understanding of truncated models. It's a two step approach. First, a model is used to model the value output for all values larger than 0, this is your truncated model (looks good to me). This model is not 'checking the 0's' but leaves the 0's out of the model. This is why it assumes a truncated negative binomial distribution, truncated in that responses must be larger than 0 so it cuts of the 0 part of the data.

The second model is a binomial model used to model the probability that the response value is not zero (checking the zero's) which is complementary to the first as it models the probability that a response belongs to the part you cut of the data (the 0's) or to the part that's in the first model (the >0's). The response of this model should not be value but as.numeric(data2$value>0) which returns a 1 if value is bigger than 0 and a 0 otherwise.

From the glmmADMB package "In contrast to zero-inflated models, hurdle models treat zero-count and non-zero outcomes as two completely separate categories, rather than treating the zero-count outcomes as a mixture of structural and sampling zeros."

So the models answer two distinct but related categories

1) Can we predict if the outcome is bigger than 0? (binomial, response is nz <- as.numeric(data2$value>0))

2) If it is bigger than 0, can we predict the value? (truncated negative binomial, response is data2$value[which(data2$value>0)])

Hope this helps. Your binomial model should be

hurP1 <- glmmadmb(nz ~ product * sex * morph + (1 | Date), 
                  data = data2, family = "binomial")

P.S.: The models do not have to use identical predictor variables, the process that determines 0 vs >0 can be different from the process that determines value given the value is larger than 0.

Related Question