I´vd tried to fit a zero-inflated negative binomial model with zeroinfl (package pscl):
model.zinb <- zeroinfl(formula = y ~ x1 + x2 + x3, data = data, dist = "negbin")
a = data.frame(count = data$y)
b = data.frame(count = fitted(model.zinb)
a$colour = "data"
b$colour = "fitted"
hist = rbind(a, b)
ggplot(hist, aes(count, fill = colour)) +
geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity', bins = 31)
However, when plotting the count data y and the fitted data fitted(model.zinb), there are somehow almost no counts between 0 and 1 in the fitted model (see plot). It looks like the zeroinfl didn´t work. Since I'm a beginner in this field, I'm hoping to get some advice. Thanks!
Best Answer
The
fitted()
method forzeroinfl
objects returns the fitted mean $\hat \mu$ for each observation which can be pretty far from some of the counts $y$ with substantial probability $f_\mathrm{zeroinfl}(y, \hat \mu)$. This is explained and illustrated in the answers to:Can a model for non-negative data with clumping at zeros (Tweedie GLM, zero-inflated GLM, etc.) predict exact zeros?
Moreover, instead of overlaying the histogram of the observed counts and expected probabilities (rather than means) it's easier to judge deviations in a so-called hanging rootogram. See: Kleiber & Zeileis (2016). The American Statistician, 70(3), 296–303. doi:10.1080/00031305.2016.1173590. An R implementation is available in the
countreg
package on R-Forge (successor to the pscl implementation) and illustrated in:Confused on how to interpret ZINB and Hurdle models
https://stackoverflow.com/questions/43075911/examining-residuals-and-visualizing-zero-inflated-poission-r/43584320