Solved – Enormous SEs in zero-inflated negative binomial regression

count-dataoverdispersionstandard errorzero inflation

I have overdispersed count data where the outcome is events (occurrence of a rare disease) and the covariate of interest is season. The unit of analysis is the number of events occurring in a country-season combination. We have 16 countries and 4 seasons repeated across each country, thus 64 data points:

enter image description here

Since I was suspicious that there may also be an excess of zeroes, I ran several different regression models for comparison:

Negative binomial
enter image description here

Zero-inflated Poisson (ZIP)
enter image description here

Negative binomial hurdle (NBH)
enter image description here

Zero-inflated negative binomial (ZINB)
enter image description here

The models yield similar results, except for one thing. The SEs of ZINB's zero model are enormous. The other three models have reasonable SEs. There is only one covariate (season) except for the offset term, so no collinearity. The residuals are asymmetric judging by the five-number summary in the output, but that's true for several of the models and it makes sense intuitively.

What could be causing this?

EDIT #1
There doesn't seem to be perfect separation in the binomial part of the model.

enter image description here

EDIT #2
Here are some Pearson residual plots. Definitely not normal, and perhaps heteroscedastic (but the latter, at least, is to be expected). However, I really have no idea what residuals from a ZINB model "should" look like if the model fits.

enter image description here

Best Answer

(Hope this is an appropriate way to answer my own question; maybe this should be a comment?)

I figured out the problem thanks in part to a fantastic answer on count regression diagnostics. The offset was the culprit causing the huge SEs, and the residuals and other goodness-of-fit diagnostics were accordingly strange. The offset term may have been a problem because one of the countries (China) had a huge population but comparatively few cases.

For comparison, here is the ZINB without the offset:

enter image description here

Hope this helps anyone else experiencing similar problems.

Related Question