Negative-binomial GLM – How to Choose Between Negative-binomial GLM vs. Log-transforming for Count Data to Reduce Type I Error Rate

generalized linear modelnegative-binomial-distributionrsimulationtype-i-and-ii-errors

Some of you might have read this nice paper:

O’Hara RB, Kotze DJ (2010) Do not log-transform count data. Methods in Ecology and Evolution 1:118–122. klick.

In my field of research (ecotoxicology) we're dealing with poorly replicated experiments and GLMs are not widely used. So I did a similar simulation as O’Hara & Kotze (2010), but mimicked ecotoxicological data.

Power simulations:

I simulated data from a factorial design with one control group ($\mu_c$) and 5 treatment groups ($\mu_{1-5}$). Abundance in treatment 1 was identical to the control ($\mu_1 = \mu_c$), abundances in treatments 2-5 was half of the abundance in the control ($\mu_{2-5} = 0.5 \mu_c$).
For the simulations I varied the sample size (3,6,9,12) and the abundance in the control group (2, 4 ,8, … , 1024).
Abundances were drawn from a negative binomial distributions with fixed dispersion parameter ($\theta = 3.91$).
100 datasets were generated and analysed using a negative binomial GLM and a gaussian GLM + log-transformed data.

The results are as expected: the GLM has greater power, especially when not many animals were sampled.
enter image description here
Code is here.

Type I Error:

Next I looked at type one error.
Simulations were done as above, however all groups had same abundance ($\mu_c = \mu_{1-5}$).

However, the results are not as expected:
enter image description here
Negative binomial GLM showed a greater Type-I error compared to LM + transformation. As expected the difference vanished with increasing sample size.
Code is here.

Question:

Why is there an increased Type-I Error compared to lm+transformation?

If we have poor data (small sample size, low abundance (many zeros)), should we then use lm+transformation? Small sample sizes (2-4 per treatment) are typical for such experiments and cannot be increased easily.

Although, the neg. bin. GLM can be justified as being appropriate for this data, lm + transformation may prevent us from type 1 errors.

Best Answer

This is an extremely interesting problem. I reviewed your code and can find no immediately obvious typo.

I would like to see you redo this simulation but use the maximum likelihood test to make inference about the heterogeneity between groups. This would involve refitting a null model so that you can get estimates of the $\theta$s under the null hypothesis of homogeneity in rates between groups. I think this is necessary because the negative binomial model is not a linear model (the rate is parameterized linearly, but the $\theta$s are not). Therefore I am not convinced drop1 argument provides correct inference.

Most tests for linear models do not require you to recompute the model under the null hypothesis. This is because you can calculate the geometric slope (score test) and approximate the width (Wald test) using parameter estimates and estimated covariance under the alternative hypothesis alone.

Since negative binomial is not linear, I think you will need to fit a null model.

EDIT:

I edited the code and got the following: enter image description here

Edited code here: https://github.com/aomidpanah/simulations/blob/master/negativeBinomialML.r

Related Solutions

Solved – Trouble finding good model fit for count data with mixed effects – ZINB or something else

This post has four years, but I wanted to follow on what fsociety said in a comment. Diagnosis of residuals in GLMMs is not straightforward, since standard residual plots can show non-normality, heteroscedasticity, etc., even if the model is correctly specified. There is an R package, DHARMa, specifically suited for diagnosing these type of models.

The package is based on a simulation approach to generate scaled residuals from fitted generalized linear mixed models and generates different easily interpretable diagnostic plots. Here is a small example with the data from the original post and the first fitted model (m1):

library(DHARMa)
sim_residuals <- simulateResiduals(m1, 1000)
plotSimulatedResiduals(sim_residuals)

The plot on the left shows a QQ plot of the scaled residuals to detect deviations from the expected distribution, and the plot on the right represents residuals vs predicted values while performing quantile regression to detect deviations from uniformity (red lines should be horizontal and at 0.25, 0.50 and 0.75).

Additionally, the package has also specific functions for testing for over/under dispersion and zero inflation, among others:

testOverdispersionParametric(m1)

Chisq test for overdispersion in GLMMs

data:  poisson
dispersion = 0.18926, pearSS = 11.35600, rdf = 60.00000, p-value = 1
alternative hypothesis: true dispersion greater 1

testZeroInflation(sim_residuals)

DHARMa zero-inflation test via comparison to expected zeros with 
simulation under H0 = fitted model


data:  sim_residuals
ratioObsExp = 0.98894, p-value = 0.502
alternative hypothesis: more

Solved – Low sample size: LR vs F – test

The Likelihood ratio test you're using uses a chi-square distribution to approximate the null distribution of likelihoods. This approximation works best with large sample sizes, so its inaccuracy with a small sample size makes some sense.

I see a few options for getting better Type-I error in your situation:

There are corrected versions of the likelihood ratio test, such as Bartlett's correction. I don't know much about these (beyond the fact that they exist), but I've heard that Ben Bolker knows more.
You could estimate the null distribution for the likelihood ratio by bootstrapping. If the observed likelihood ratio falls outside middle 95% of the bootstrap distribution, then it's statistically significant.

Finally, the Poisson distribution has one fewer free parameter than the negative binomial, and might be worth trying when the sample size is very small.

Best Answer

Related Solutions

Solved – Trouble finding good model fit for count data with mixed effects – ZINB or something else

Solved – Low sample size: LR vs F – test

Related Question