I have overdispersed count data where the outcome is events (occurrence of a rare disease) and the covariate of interest is season. The unit of analysis is the number of events occurring in a country-season combination. We have 16 countries and 4 seasons repeated across each country, thus 64 data points:
Since I was suspicious that there may also be an excess of zeroes, I ran several different regression models for comparison:
Negative binomial
Zero-inflated Poisson (ZIP)
Negative binomial hurdle (NBH)
Zero-inflated negative binomial (ZINB)
The models yield similar results, except for one thing. The SEs of ZINB's zero model are enormous. The other three models have reasonable SEs. There is only one covariate (season) except for the offset term, so no collinearity. The residuals are asymmetric judging by the five-number summary in the output, but that's true for several of the models and it makes sense intuitively.
What could be causing this?
EDIT #1
There doesn't seem to be perfect separation in the binomial part of the model.
EDIT #2
Here are some Pearson residual plots. Definitely not normal, and perhaps heteroscedastic (but the latter, at least, is to be expected). However, I really have no idea what residuals from a ZINB model "should" look like if the model fits.
Best Answer
(Hope this is an appropriate way to answer my own question; maybe this should be a comment?)
I figured out the problem thanks in part to a fantastic answer on count regression diagnostics. The offset was the culprit causing the huge SEs, and the residuals and other goodness-of-fit diagnostics were accordingly strange. The offset term may have been a problem because one of the countries (China) had a huge population but comparatively few cases.
For comparison, here is the ZINB without the offset:
Hope this helps anyone else experiencing similar problems.