Solved – Calculate interaction effect confidence intervals in zero-inflated poisson regression

confidence intervalinteractionrzero inflation

I'm conducting a zero-inflated Poisson regression using the pscl package in R. I've included interaction terms but am having an issue with interpretation. I am assuming an additive effect and summing coefficients (x + y + xy) but am not sure what to do about the confidence intervals or p-values. From what I understand, these have to be re-estimated, probably with some bootstrapping method, but I can't figure out how to do this.

The main issue is that one of my effects reverses in interaction. I've provided a simplified version of the code below (sorry, I can't share the data). Here's a brief description of the scenario: when doctors discuss re-injury prevention with their patients, time off work increases, but in interaction with a low-stress setting, it reduces time off work.

So my question is:

How do you calculate interaction effect confidence intervals and p-values using a zeroinfl object?
Is the process for calculating CIs and p-values different between the zero and count models?

Example code would be greatly appreciated!

model <- zeroinfl(TimeLoss ~ PrevDisc + LowStress + 
                PrevDisc * LowStress, 
              data = Doctors)

Best Answer

If PrevDisc and LowStress are binary variables (this is my impression from your description), then the interaction model simply corresponds to four different zero-inflated Poisson distributions: one for each combination of PrevDisc and LowStress.

When using the formula TimeLoss ~ PrevDisc * LowStress, treatment coding of the coefficients is used, i.e., the four parameters (in each model part) are coded as an intercept, two main effects, and an interaction effect. This coding facilitates judging whether or not the interaction effect is significant.

If you want to assess the PrevDisc effect separately for the two LowStress groups, then you can use a nested coding of the coefficients via the formula TimeLoss ~ LowStress/PrevDisc.

In either case, all inference can be done "in the usual way", i.e., summary() and confint() for marginal Wald tests and Wald confidence intervals. But also lrtest() (from lmtest) for nested model comparisons with the likelihood ratio test or AIC()/BIC(). (Generalized) linear hypotheses can be tests with linearHypothesis() (from car) and glht() (from multcomp) respectively.

The difference in interpretation between the two model parts is that the count model is a log-linear model for the mean in the count component. The zero model is log-linear for the odds of zero inflation (i.e., an observation from the point mass component).

Related Solutions

Solved – Zero-inflated count models in R: what is the real advantage

I think this is a poorly chosen data set for exploring the advantages of zero inflated models, because, as you note, there isn't that much zero inflation.

plot(fitted(fm_pois), fitted(fm_zinb))

shows that the predicted values are almost identical.

In data sets with more zero-inflation, the ZI models give different (and usually better fitting) results than Poisson.

Another way to compare the fit of the models is to compare the size of residuals:

boxplot(abs(resid(fm_pois) - resid(fm_zinb)))

shows that, even here, the residuals from the Poisson are smaller than those from the ZINB. If you have some idea of a magnitude of the residual that is really problematic, you can see what proportion of the residuals in each model are above that. E.g. if being off by more than 1 was unacceptable

sum(abs(resid(fm_pois) > 1))
sum(abs(resid(fm_zinb) > 1))

shows the latter is a bit better - 20 fewer large residuals.

Then the question is whether the added complexity of the models is worth it to you.

Zero-Inflated Poisson Regression – When to Use Zero-Inflated Poisson Regression and Negative Binomial Distribution

I suspect that your problem may be that the default behavior of predict.glm isn't what you think it is.

Specifically, predict used on a glm object will by default gives a response on the scale of the linear predictors, not the response.

This is quite clearly stated in the help (?predict.glm) but seems to trip people up very often (suggesting the default ought to be changed, perhaps; you might like to raise it on the relevant mailing list).

To get the values you want, try predict(model1,type="response")

Best Answer

Related Solutions

Solved – Zero-inflated count models in R: what is the real advantage

Zero-Inflated Poisson Regression – When to Use Zero-Inflated Poisson Regression and Negative Binomial Distribution

Related Question