Poisson Regression – How to Simulate from a Zero-Inflated Poisson Distribution

poisson-regressionrsimulationzero inflation

I am trying to simulate from observed data that I have fit to a zero-inflated Poisson regression model. I fit the data in R using zeroinfl() from the package pscl, but I am having trouble figuring out how to derive the ZIP distribution from the coefficient estimates.

I know how to derive the predicted counts from these coefficient estimates (more information here: http://www.ats.ucla.edu/stat/stata/faq/predict_zip.htm), but can anyone help me understand how to find/derive estimates for my distribution parameters (i.e. lambda for the Poisson distribution, p for the Bernoulli distribution) that I can then sample from?

Best Answer

You can get the probability of zero-inflation by

p <- predict(object, ..., type = "zero")

and the mean of the count distribution by

lambda <- predict(object, ..., type = "count")

See Appendix C of vignette("countreg", package = "pscl") for a few more details.

To simulate the distribution, you can either do it manually with

ifelse(rbinom(n, size = 1, prob = p) > 0, 0, rpois(n, lambda = lambda))

or you can use rzipois() from the VGAM package

library("VGAM")
rzipois(n, lambda = lambda, pstr0 = p)

which essentially also does an ifelse() as above but adds a few sanity checks etc.

Related Solutions

Solved – Zero-inflated count models in R: what is the real advantage

I think this is a poorly chosen data set for exploring the advantages of zero inflated models, because, as you note, there isn't that much zero inflation.

plot(fitted(fm_pois), fitted(fm_zinb))

shows that the predicted values are almost identical.

In data sets with more zero-inflation, the ZI models give different (and usually better fitting) results than Poisson.

Another way to compare the fit of the models is to compare the size of residuals:

boxplot(abs(resid(fm_pois) - resid(fm_zinb)))

shows that, even here, the residuals from the Poisson are smaller than those from the ZINB. If you have some idea of a magnitude of the residual that is really problematic, you can see what proportion of the residuals in each model are above that. E.g. if being off by more than 1 was unacceptable

sum(abs(resid(fm_pois) > 1))
sum(abs(resid(fm_zinb) > 1))

shows the latter is a bit better - 20 fewer large residuals.

Then the question is whether the added complexity of the models is worth it to you.

Solved – not able to fit a zero inflated poisson distribution

This works:

fit <- fitdist(vect, "ZIP", start=list(mu=2.4, sigma=0.1),
      lower=c(-Inf, 0.001), upper=c(Inf, 1), optim.method="L-BFGS-B")

which gives a likelihood of -7853.122

So @Ben Bolker is correct.

It doesn't work even if I specify lower to be 0, as it would try to evaluate at sigma = 0, which is not supported for ZIP.

Best Answer

Related Solutions

Solved – Zero-inflated count models in R: what is the real advantage

Solved – not able to fit a zero inflated poisson distribution

Related Question