Solved – Zero-inflated count models in R: what is the real advantage

poisson distributionrzero inflation

For analysing zero-inflated bird counts I'd like to apply zero-inflated count models using the R package pscl. However, having a look at the example provided in the documentation for one of the main functions (?zeroinfl), I begin doubting what's the real advantage of these models. According to the sample code given there, I calculated standard poisson, quasi-poisson and negative bionomial models, simple zero-inflated poisson and negative binomial models and zero-inflated poisson and negative-binomial models with regressors for the zero component. Then I inspected the histograms of the observed and the fitted data. (Here's the code for replicating that.)

library(pscl)
data("bioChemists", package = "pscl")

## standard count data models
fm_pois  <- glm(art ~ .,    data = bioChemists, family = poisson)
fm_qpois <- glm(art ~ .,    data = bioChemists, family = quasipoisson)
fm_nb    <- glm.nb(art ~ ., data = bioChemists)

## with simple inflation (no regressors for zero component)
fm_zip  <- zeroinfl(art ~ . | 1, data = bioChemists)
fm_zinb <- zeroinfl(art ~ . | 1, data = bioChemists, dist = "negbin")

## inflation with regressors
fm_zip2  <- zeroinfl(art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + 
                     ment, data = bioChemists)
fm_zinb2 <- zeroinfl(art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + 
                     ment, data = bioChemists, dist = "negbin")

## histograms
breaks <- seq(-0.5,20.5,1)
par(mfrow=c(4,2))
hist(bioChemists$art,  breaks=breaks)
hist(fitted(fm_pois),  breaks=breaks)
hist(fitted(fm_qpois), breaks=breaks)
hist(fitted(fm_nb),    breaks=breaks)
hist(fitted(fm_zip),   breaks=breaks)
hist(fitted(fm_zinb),  breaks=breaks)
hist(fitted(fm_zip2),  breaks=breaks)
hist(fitted(fm_zinb2), breaks=breaks)!

Histogram of observed and fitted data

I can't see any fundamental difference between the different models (apart from that the example data don't appear very "zero-inflated" to me…); actually none of the models yields a halfway reasonable estimation of the number of zeros. Can anyone explain what's the advantage of the zero-inflated models? I suppose there must have been a reason to choose this as the example for the function.

Best Answer

I think this is a poorly chosen data set for exploring the advantages of zero inflated models, because, as you note, there isn't that much zero inflation.

plot(fitted(fm_pois), fitted(fm_zinb))

shows that the predicted values are almost identical.

In data sets with more zero-inflation, the ZI models give different (and usually better fitting) results than Poisson.

Another way to compare the fit of the models is to compare the size of residuals:

boxplot(abs(resid(fm_pois) - resid(fm_zinb)))

shows that, even here, the residuals from the Poisson are smaller than those from the ZINB. If you have some idea of a magnitude of the residual that is really problematic, you can see what proportion of the residuals in each model are above that. E.g. if being off by more than 1 was unacceptable

sum(abs(resid(fm_pois) > 1))
sum(abs(resid(fm_zinb) > 1))

shows the latter is a bit better - 20 fewer large residuals.

Then the question is whether the added complexity of the models is worth it to you.