(Hope this is an appropriate way to answer my own question; maybe this should be a comment?)
I figured out the problem thanks in part to a fantastic answer on count regression diagnostics. The offset was the culprit causing the huge SEs, and the residuals and other goodness-of-fit diagnostics were accordingly strange. The offset term may have been a problem because one of the countries (China) had a huge population but comparatively few cases.
For comparison, here is the ZINB without the offset:
Hope this helps anyone else experiencing similar problems.
This post has four years, but I wanted to follow on what fsociety said in a comment. Diagnosis of residuals in GLMMs is not straightforward, since standard residual plots can show non-normality, heteroscedasticity, etc., even if the model is correctly specified. There is an R package, DHARMa
, specifically suited for diagnosing these type of models.
The package is based on a simulation approach to generate scaled residuals from fitted generalized linear mixed models and generates different easily interpretable diagnostic plots. Here is a small example with the data from the original post and the first fitted model (m1):
library(DHARMa)
sim_residuals <- simulateResiduals(m1, 1000)
plotSimulatedResiduals(sim_residuals)
The plot on the left shows a QQ plot of the scaled residuals to detect deviations from the expected distribution, and the plot on the right represents residuals vs predicted values while performing quantile regression to detect deviations from uniformity (red lines should be horizontal and at 0.25, 0.50 and 0.75).
Additionally, the package has also specific functions for testing for over/under dispersion and zero inflation, among others:
testOverdispersionParametric(m1)
Chisq test for overdispersion in GLMMs
data: poisson
dispersion = 0.18926, pearSS = 11.35600, rdf = 60.00000, p-value = 1
alternative hypothesis: true dispersion greater 1
testZeroInflation(sim_residuals)
DHARMa zero-inflation test via comparison to expected zeros with
simulation under H0 = fitted model
data: sim_residuals
ratioObsExp = 0.98894, p-value = 0.502
alternative hypothesis: more
Best Answer
I just randomly came across this question. Sorry it's so late. It is fine to model the two parts of a hurdle model separately. This paper addresses hurdle and zero-inflated models https://journal.r-project.org/archive/2017/RJ-2017-066/index.html
Using data from the glmmTMB package here's an example. You could fit the data with either a zero-inflated model (
zinb
below, where zeros can be from either the negative binomial or the zero-inflation) or with a hurdle model (hnb
below). Then we can see thathnb
is statistically equivalent to the combination of a binomial model for the zeros and a zero-truncated model for the positive counts.You can also see the coefficients if you try it.