Solved – Justification for using a zero-inflated negative binomial regression

modelingnegative-binomial-distributionregressionzero inflation

I'm trying to describe in words why I used a zero-inflated negative binomial regression instead of an negative binomial regression:

To model my data I used a negative binomial regression. However, as my response variable included a high proportion of zeros (more than would be expected under the negative binomial distribution), a negative binomial regression did not fit my data well. More specifically, as the negative binomial regression was attempting to account for the high number of zeros and the counts simultaneously, the predicted values were overly the biased towards the zeros and the residual variation was high. In an attempt to correct these issues, I used a zero-inflated negative binomial regression. The zero-inflated negative binomial regression specified a model for the zeros and a model for the counts. This model reduced the residual variation because the zeros were modelled separately to the counts and therefore the predicted values for the counts were not weighted too heavily in favour of the zeros.

Could people comment on/edit/correct my justification?

Best Answer

I think you're on the right track. Zero-inflated models allow you to accommodate values that happen to be zero (but could plausibly take other values) and and certain zeros that are fixed at zero. You may want to provide specific examples of how both situations apply to your data. For example,

Some employees may have taken zero sick days because they were not ill. However since some entry-level positions do not provide paid sick leave, zero used days of sick leave were reported for these employees. Zero inflation allows the model to...."

Adding these specifics helps justify your choice of model beyond "well, it kinda fits better." You want to convince people that your data would be well-fit by a negative binomial model if you could somehow magically remove the certain zeros.