Solved – Residuals in Zero-Inflated Negative Binomial Regression

negative-binomial-distributionregressionresidualszero inflation

What do residuals mean in the context of zero-inflated negative binomial regression?

I'm learning zero-inflated negative binomial regression. The data is from a state education system and includes variables about the number of migrant students identified by each school (which is zero-inflated) as well as variables reflecting a number of sociodemographic characteristics (e.g., poverty level, race)

My analyses have two goals:

  1. Can I predict zero-inflation and the number of migrant students identified at each school by sociodemographic characteristics?
  2. Can I use the residuals to identify individual schools that are likely under-identifying migrant students?

I still have a soft/incomplete understanding of the dual-component nature of the ZINB regression, meaning the binomial / zero-inflation model combined with the negative binomial / count model. When I ask r for residuals, I get one residual coefficient for each school.

Is the residual for the binomial model? Or is the residual for the count model? Some combination? Am I thinking about this all wrong?

Best Answer

A residual generally measures the distance between your observed data and what you expect, given your fitted model. Distance can be measured in a number of ways, and thus there are different residual definitions.

For the linear regression, the most common way is to simply plot the distance between the mean expectation and the observation. This makes sense because the adequacy of the residuals is easily checked visually, as the assumptions of the linear regression imply normal, symmetric, homogenous residuals with constant variance.

For other (discrete) distributions (including the ZINB case), a visual check of raw residuals is less useful, as variance and shape of the distributions usually change with the mean, plus there is the problem of assessing the distribution of discrete values.

For such models, better suitable than raw residuals are Pearson and deviance residuals, which can also be used for zero-inflated models. However, with strong zero-inflation, they will not appear homogenous either. The "gold standard" imo are are (simulated) quantile residuals, which are also called Bayesian p-values in Bayesian statistics. See an example with a Bayesian zero-inflated Poisson here. I think this would be the best for your ZINB regression, although this would probably require a bit of coding by hand.

Regarding your specific questions:

  1. I guess what you are asking is if you can make zero-inflation dependent on a predictor? The answer is yes.

  2. Yes, you can look at residuals to check for that, but if you want to see an effect of school, it would be much more straightforward to simply include school as a fixed or random effect and look at the estimates.