Zero-Inflation – Troubleshooting ‘zeroinfl’ Functionality Issues in R

modelnegative-binomial-distributionrregressionzero inflation

I´vd tried to fit a zero-inflated negative binomial model with zeroinfl (package pscl):

model.zinb <- zeroinfl(formula = y ~ x1 + x2 + x3, data = data, dist = "negbin")          

a = data.frame(count = data$y)
b = data.frame(count = fitted(model.zinb)  
a$colour = "data"       
b$colour = "fitted"        
hist = rbind(a, b)

ggplot(hist, aes(count, fill = colour)) + 
      geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity', bins = 31)

However, when plotting the count data y and the fitted data fitted(model.zinb), there are somehow almost no counts between 0 and 1 in the fitted model (see plot). It looks like the zeroinfl didn´t work. Since I'm a beginner in this field, I'm hoping to get some advice. Thanks!

Best Answer

The fitted() method for zeroinfl objects returns the fitted mean $\hat \mu$ for each observation which can be pretty far from some of the counts $y$ with substantial probability $f_\mathrm{zeroinfl}(y, \hat \mu)$. This is explained and illustrated in the answers to:

Can a model for non-negative data with clumping at zeros (Tweedie GLM, zero-inflated GLM, etc.) predict exact zeros?

Moreover, instead of overlaying the histogram of the observed counts and expected probabilities (rather than means) it's easier to judge deviations in a so-called hanging rootogram. See: Kleiber & Zeileis (2016). The American Statistician, 70(3), 296–303. doi:10.1080/00031305.2016.1173590. An R implementation is available in the countreg package on R-Forge (successor to the pscl implementation) and illustrated in:

Confused on how to interpret ZINB and Hurdle models

https://stackoverflow.com/questions/43075911/examining-residuals-and-visualizing-zero-inflated-poission-r/43584320

Related Solutions

Negative Binomial Distribution – How to Interpret Zeroinfl Results from Emmeans

If I understand correctly what you did, your emm.ZINB.count.lin object estimates the log of the count component of the model, without the zero inflation. It does not give the offsets. Please note that offsets are fixed quantities -- they are not estimated. If you look at emm.ZINB2.count.lin@grid, you will see extra columns named .wgt. and .offset., and the latter indicates the (fixed) offset for each node in the reference grid. Since these offsets do not depend on Species, it will be constant.

Meanwhile, your emm.ZINB2.lin does estimate the mean counts, accounting for zero inflation. So to get rates, you may do

offs <- emm.ZINB2.count.lin@grid[1, ".offset."]  # this is on the log scale
contrast(emm.ZINB2.lin, method="tukey", infer=TRUE, scale = exp(-offs))

which will divide each contrast by exp(offs)

Best Answer

Related Solutions

Negative Binomial Distribution – How to Interpret Zeroinfl Results from Emmeans

Related Question