When is it appropriate to use a Type I versus Type II negative binominal distribution in a zero-inflated negative binominal distribution?
I've found a Similar question, but without an answer I can comprehend or determine if it relates to zero-inflated negative binominal distributions
In my dataset using R code, I have determind using AIC values that the zero-inflated negative binominal (ZINB) distribution provides the best fit, compared to other distribution models.
Using the R package glmmTMB
i have specified ZINB models with both Type I and Type II:
library(glmmTMB);library(MuMIn)
m1 <- glmmTMB(dv ~ iv1 + iv2 + iv2,ziformula=~.,data=df,family=nbinom1)
m2 <- glmmTMB(dv ~ iv1 + iv2 + iv2,ziformula=~.,data=df,family=nbinom2)
When testing the models AIC values, the model with Type II provides a better fit
AICc(m1,m2)
df AICc
m1 9 528.3359
m2 9 527.7481
When using the zeroinfl
function from the pscl
zm2 <- zeroinfl(dv ~ iv1 + iv2 + iv2,data=df,dist="negbin")
> AICc(m2,zm2)
df AICc
m2 9 527.7481
zm2 9 527.7481
It yields the same AIC value as Type 2 NB (and the estimtes and p-values are nearly identical), So it seems that Type 2 is assumed in the zeroinfl
function.
My data set models the use of drugs (as the dv
) over the past 30 days.
Is it appropriate to use a Type II negative binominal distribution and why?.
Is it reasonable to justify this descision with AIC values?
Best Answer
The difference between these two model families is the relationship between mean and variance.
nbinom1 (also called quasi-poisson) variance = µ * phi
where µ is the mean and phi is the over-dispersion parameter
nbinom2 (the default negative binomial in most packages) variance = µ(1+µ/k) also written µ + (µ^2)/k
where µ is the mean and k is the over-dispersion parameter
When choosing between these the paper by VerHoef, J.M. & Boveng is very helpful as are pages 16 and 17 of Bolker et al 2012.
VerHoef, J.M. & Boveng say that AIC doesn't necessarily apply to quasi poisson models (nbinom1) and they are skeptical about comparing AIC and qAIC (an information criteria developed for quasi models) although you do see it done.
Instead they recommend plotting the observed values against the squared residuals. This plot can be very noisy so grouping samples with similar observed values together and making the equivalent plot for the groups is recommended. If this plot follows a linear trend it suggests quasi-poisson (nbinom1) is best whereas a quadratic trend argues for a negative binomial model (nbinom2).
If you have a decent number of samples and a finite number of possible combinations of explanatory variables you could form groups not based on response variables but on treatment combinations. This plot is demonstrated in Bolker et al 2012 (link in the references) along with code to generate the plot in R.
Ben Bolker, Mollie Brooks, Beth Gardner, Cleridy Lennert, Mihoko Minami, October 23, 2012, Owls example: a zero-inflated, generalized linear mixed model for count data. https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
VerHoef, J.M. & Boveng, P.L., 2007. Quasi-Poisson Vs. Negative Binomial Regression: How Should We Model Overdispersed Count Data? Ecology, 88(11), pp.2766–2772.