I think this is a poorly chosen data set for exploring the advantages of zero inflated models, because, as you note, there isn't that much zero inflation.
plot(fitted(fm_pois), fitted(fm_zinb))
shows that the predicted values are almost identical.
In data sets with more zero-inflation, the ZI models give different (and usually better fitting) results than Poisson.
Another way to compare the fit of the models is to compare the size of residuals:
boxplot(abs(resid(fm_pois) - resid(fm_zinb)))
shows that, even here, the residuals from the Poisson are smaller than those from the ZINB. If you have some idea of a magnitude of the residual that is really problematic, you can see what proportion of the residuals in each model are above that. E.g. if being off by more than 1 was unacceptable
sum(abs(resid(fm_pois) > 1))
sum(abs(resid(fm_zinb) > 1))
shows the latter is a bit better - 20 fewer large residuals.
Then the question is whether the added complexity of the models is worth it to you.
Criterion is based upon (informed) model comparisons. You are trying to account for over-dispersion.
Poisson var(x) ~ mu
Neg Binomial var(x) > mu
"Extra" zeros
ZIP var(x) ~ mu
ZIPB var(x) > mu
One active package that you can use is install.packages("pscl")
You can then fit a number of models such as a hurdle model that uses a negative binomial for the counts and a binomial model for modeling the probability of zeros. This would be written something like:
fit <- hurdle(Admission ~ Temperature + Humidity), dist="negbin", data = data)
summary (fit)
Note that the output will have two sets of coefficients: one for the hurdle component and one for the count data. This output also provides an estimate of the theta parameter (overdispersion) of the negative binomial
Or you may want to look at the zero-inflation model
fit1<-zeroinfl(Admissions ~ Temperature + Humidity), data = data,dist="negbin",link="logit")
These models can be examined with AIC (also compare these models to your Poisson model...)
AIC(fit,fit1)
Best Answer
I only know what I've read, but I believe the difference is that excess zeros are zeros where there could not be any events, while true zeros occur where there could have been an event, but there was none. For example, people coming into a bank: during business hours, there might be a period of time when zero customers entered the bank (true zero), but when the bank is closed, you will always get zeros (excess zeros) and since the bank is closed more than it is open you will get a lot of excess zeros.