Solved – How to fit a zero inflated poisson model with only offset (without coefficients)

poisson distributionrregressionzero inflation

I have already got a poisson estimated lambda, and actual result y, and I would like to see if the model is good.

To start with, I check if the dispersion is alright.

glm(y ~ 0, offset=log(lambda), data=data, family=quasipoisson)

And then I would like to see if there is zero inflation factor by using pscl package.

zeroinfl(y ~ 1, offset=log(lambda), data=data ,dist="poisson")

However, I am not able to do this:

zeroinfl(y ~ 0, offset=log(lambda), data=data ,dist="poisson")

By introducing an interception, it is likely to interfere with the zero estimator so it becomes unclear if there is any zero (and how big) inflation in the original model.

Is it possible to have 0 estimated parameters (besides zeros)?

Best Answer

Actually, your second line still works - it just overrides your input $\lambda$ with an estimated $\lambda$. Unless you have good reason to believe your input $\lambda$ is more accurate than the estimate that is done by zeroinfl, I'd just leave it.

On the other hand, if you're really firm about using the input $\lambda$, the way to check for a zero inflation factor is to compare the observed number of zeros with the expected number of zeros given $\lambda$. You can do a standard significance test for this, or just use the difference divided by sample size as an estimate of the zero inflation factor:

lambda <- 2
p_zero <- 0.2
x = rpois(1000, lambda) * rbinom(1000, 1, 1-p_zero)

expected_zero <- exp(-lambda)
observed_zero <- mean(x==0)
> expected_zero
[1] 0.1353353
> observed_zero
[1] 0.33

and the difference of 0.1946647 would be your estimate of the zero inflation factor given your estimate of $\lambda$.

The significance test in this case can be done using the usual Normal approximation to the distribution of proportions to compare the observed and expected frequencies of zeroes:

> zstat <- (observed_zero-expected_zero) / sqrt(expected_zero*(1-expected_zero)/length(x))
> zstat
[1] 17.99525
> pnorm(zstat)
[1] 1

Related Solutions

Solved – Zero-inflated count models in R: what is the real advantage

I think this is a poorly chosen data set for exploring the advantages of zero inflated models, because, as you note, there isn't that much zero inflation.

plot(fitted(fm_pois), fitted(fm_zinb))

shows that the predicted values are almost identical.

In data sets with more zero-inflation, the ZI models give different (and usually better fitting) results than Poisson.

Another way to compare the fit of the models is to compare the size of residuals:

boxplot(abs(resid(fm_pois) - resid(fm_zinb)))

shows that, even here, the residuals from the Poisson are smaller than those from the ZINB. If you have some idea of a magnitude of the residual that is really problematic, you can see what proportion of the residuals in each model are above that. E.g. if being off by more than 1 was unacceptable

sum(abs(resid(fm_pois) > 1))
sum(abs(resid(fm_zinb) > 1))

shows the latter is a bit better - 20 fewer large residuals.

Then the question is whether the added complexity of the models is worth it to you.

Zero-Inflated Poisson Regression – When to Use Zero-Inflated Poisson Regression and Negative Binomial Distribution

I suspect that your problem may be that the default behavior of predict.glm isn't what you think it is.

Specifically, predict used on a glm object will by default gives a response on the scale of the linear predictors, not the response.

This is quite clearly stated in the help (?predict.glm) but seems to trip people up very often (suggesting the default ought to be changed, perhaps; you might like to raise it on the relevant mailing list).

To get the values you want, try predict(model1,type="response")

Best Answer

Related Solutions

Solved – Zero-inflated count models in R: what is the real advantage

Zero-Inflated Poisson Regression – When to Use Zero-Inflated Poisson Regression and Negative Binomial Distribution

Related Question