Zero-Inflation – Troubleshooting ‘zeroinfl’ Functionality Issues in R

modelnegative-binomial-distributionrregressionzero inflation

I´vd tried to fit a zero-inflated negative binomial model with zeroinfl (package pscl):

model.zinb <- zeroinfl(formula = y ~ x1 + x2 + x3, data = data, dist = "negbin")          

a = data.frame(count = data$y)
b = data.frame(count = fitted(model.zinb)  
a$colour = "data"       
b$colour = "fitted"        
hist = rbind(a, b)

ggplot(hist, aes(count, fill = colour)) + 
      geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity', bins = 31)   

However, when plotting the count data y and the fitted data fitted(model.zinb), there are somehow almost no counts between 0 and 1 in the fitted model (see plot). It looks like the zeroinfl didn´t work. Since I'm a beginner in this field, I'm hoping to get some advice. Thanks!

enter image description here

Best Answer

The fitted() method for zeroinfl objects returns the fitted mean $\hat \mu$ for each observation which can be pretty far from some of the counts $y$ with substantial probability $f_\mathrm{zeroinfl}(y, \hat \mu)$. This is explained and illustrated in the answers to:

Can a model for non-negative data with clumping at zeros (Tweedie GLM, zero-inflated GLM, etc.) predict exact zeros?

Moreover, instead of overlaying the histogram of the observed counts and expected probabilities (rather than means) it's easier to judge deviations in a so-called hanging rootogram. See: Kleiber & Zeileis (2016). The American Statistician, 70(3), 296–303. doi:10.1080/00031305.2016.1173590. An R implementation is available in the countreg package on R-Forge (successor to the pscl implementation) and illustrated in:

Confused on how to interpret ZINB and Hurdle models

https://stackoverflow.com/questions/43075911/examining-residuals-and-visualizing-zero-inflated-poission-r/43584320

Related Question