Solved – Zero inflated models – “true zero” vs. “excess zero”

poisson distributionzero inflation

I am trying to decide if zero inflated poisson is appropriate for my data vs. a Poisson hurdle model.

In background reading between the two I've run across a statement saying that a zero inflated model attempts to distinguish between true zeros and excess zeros. I'm having a problem understanding what is the different between those two zeros.

Can anyone explain what those two types of zeros mean in the context of zero inflated modeling?

Best Answer

I only know what I've read, but I believe the difference is that excess zeros are zeros where there could not be any events, while true zeros occur where there could have been an event, but there was none. For example, people coming into a bank: during business hours, there might be a period of time when zero customers entered the bank (true zero), but when the bank is closed, you will always get zeros (excess zeros) and since the bank is closed more than it is open you will get a lot of excess zeros.

Related Solutions

Solved – Zero-inflated count models in R: what is the real advantage

I think this is a poorly chosen data set for exploring the advantages of zero inflated models, because, as you note, there isn't that much zero inflation.

plot(fitted(fm_pois), fitted(fm_zinb))

shows that the predicted values are almost identical.

In data sets with more zero-inflation, the ZI models give different (and usually better fitting) results than Poisson.

Another way to compare the fit of the models is to compare the size of residuals:

boxplot(abs(resid(fm_pois) - resid(fm_zinb)))

shows that, even here, the residuals from the Poisson are smaller than those from the ZINB. If you have some idea of a magnitude of the residual that is really problematic, you can see what proportion of the residuals in each model are above that. E.g. if being off by more than 1 was unacceptable

sum(abs(resid(fm_pois) > 1))
sum(abs(resid(fm_zinb) > 1))

shows the latter is a bit better - 20 fewer large residuals.

Then the question is whether the added complexity of the models is worth it to you.

Zero-Inflated Poisson Model – Comprehensive Understanding

Criterion is based upon (informed) model comparisons. You are trying to account for over-dispersion.

Poisson var(x) ~ mu

Neg Binomial var(x) > mu

"Extra" zeros

ZIP var(x) ~ mu

ZIPB var(x) > mu
One active package that you can use is install.packages("pscl") You can then fit a number of models such as a hurdle model that uses a negative binomial for the counts and a binomial model for modeling the probability of zeros. This would be written something like:
```
fit <- hurdle(Admission ~ Temperature + Humidity), dist="negbin", data = data)

 summary (fit)
```

Note that the output will have two sets of coefficients: one for the hurdle component and one for the count data. This output also provides an estimate of the theta parameter (overdispersion) of the negative binomial

Or you may want to look at the zero-inflation model

fit1<-zeroinfl(Admissions ~ Temperature + Humidity), data = data,dist="negbin",link="logit")

These models can be examined with AIC (also compare these models to your Poisson model...) AIC(fit,fit1)

Best Answer

Related Solutions

Solved – Zero-inflated count models in R: what is the real advantage

Zero-Inflated Poisson Model – Comprehensive Understanding

Related Question