For analysing zero-inflated bird counts I'd like to apply zero-inflated count models using the R package pscl. However, having a look at the example provided in the documentation for one of the main functions (?zeroinfl), I begin doubting what's the real advantage of these models. According to the sample code given there, I calculated standard poisson, quasi-poisson and negative bionomial models, simple zero-inflated poisson and negative binomial models and zero-inflated poisson and negative-binomial models with regressors for the zero component. Then I inspected the histograms of the observed and the fitted data. (Here's the code for replicating that.)
library(pscl)
data("bioChemists", package = "pscl")
## standard count data models
fm_pois <- glm(art ~ ., data = bioChemists, family = poisson)
fm_qpois <- glm(art ~ ., data = bioChemists, family = quasipoisson)
fm_nb <- glm.nb(art ~ ., data = bioChemists)
## with simple inflation (no regressors for zero component)
fm_zip <- zeroinfl(art ~ . | 1, data = bioChemists)
fm_zinb <- zeroinfl(art ~ . | 1, data = bioChemists, dist = "negbin")
## inflation with regressors
fm_zip2 <- zeroinfl(art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd +
ment, data = bioChemists)
fm_zinb2 <- zeroinfl(art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd +
ment, data = bioChemists, dist = "negbin")
## histograms
breaks <- seq(-0.5,20.5,1)
par(mfrow=c(4,2))
hist(bioChemists$art, breaks=breaks)
hist(fitted(fm_pois), breaks=breaks)
hist(fitted(fm_qpois), breaks=breaks)
hist(fitted(fm_nb), breaks=breaks)
hist(fitted(fm_zip), breaks=breaks)
hist(fitted(fm_zinb), breaks=breaks)
hist(fitted(fm_zip2), breaks=breaks)
hist(fitted(fm_zinb2), breaks=breaks)!
I can't see any fundamental difference between the different models (apart from that the example data don't appear very "zero-inflated" to me…); actually none of the models yields a halfway reasonable estimation of the number of zeros. Can anyone explain what's the advantage of the zero-inflated models? I suppose there must have been a reason to choose this as the example for the function.
Best Answer
I think this is a poorly chosen data set for exploring the advantages of zero inflated models, because, as you note, there isn't that much zero inflation.
shows that the predicted values are almost identical.
In data sets with more zero-inflation, the ZI models give different (and usually better fitting) results than Poisson.
Another way to compare the fit of the models is to compare the size of residuals:
shows that, even here, the residuals from the Poisson are smaller than those from the ZINB. If you have some idea of a magnitude of the residual that is really problematic, you can see what proportion of the residuals in each model are above that. E.g. if being off by more than 1 was unacceptable
shows the latter is a bit better - 20 fewer large residuals.
Then the question is whether the added complexity of the models is worth it to you.