Generalized Linear Model – Are Over-Dispersion Tests in GLMs Useful?

generalized linear modeloverdispersionpoisson-regressionquasi-likelihood

The phenomenon of 'over-dispersion' in a GLM arises whenever we use a model that restricts the variance of the response variable, and the data exhibits greater variance than the model restriction allows. This occurs commonly when modelling count data using a Poisson GLM, and it can be diagnosed by well-known tests. If tests show that there is statistically significant evidence of over-dispersion then we usually generalise the model by using a broader family of distributions that free the variance parameter from the restriction occurring under the original model. In the case of a Poisson GLM it is common to generalise either to a negative-binomial or quasi-Poisson GLM.

This situation is pregnant with an obvious objection. Why start with a Poisson GLM at all? One can start directly with the broader distributional forms, which have a (relatively) free variance parameter, and allow the variance parameter to be fit to the data, ignoring over-dispersion tests completely. In other situations when we are doing data analysis we almost always use distributional forms that allow freedom of at least the first two-moments, so why make an exception here?

My Question: Is there any good reason to start with a distribution that fixes the variance (e.g., the Poisson distribution) and then perform an over-dispersion test? How does this procedure compare with skipping this exercise completely and going straight to the more general models (e.g., negative-binomial, quasi-Poisson, etc.)? In other words, why not always use a distribution with a free variance parameter?

Best Answer

In principle, I actually agree that 99% of the time, it's better to just use the more flexible model. With that said, here are two and a half arguments for why you might not.

(1) Less flexible means more efficient estimates. Given that variance parameters tend to be less stable than mean parameters, your assumption of fixed mean-variance relation may stabilize standard errors more.

(2) Model checking. I've worked with physicists who believe that various measurements can be described by Poisson distributions due to theoretical physics. If we reject the hypothesis that mean = variance, we have evidence against the Poisson distribution hypothesis. As pointed out in a comment by @GordonSmyth, if you have reason to believe that a given measurement should follow a Poisson distribution, if you have evidence of over dispersion, you have evidence that you are missing important factors.

(2.5) Proper distribution. While the negative binomial regression comes from a valid statistical distribution, it's my understanding that the Quasi-Poisson does not. That means you can't really simulate count data if you believe $Var[y] = \alpha E[y]$ for $\alpha \neq 1$. That might be annoying for some use cases. Likewise, you can't use probabilities to test for outliers, etc.

Related Solutions

Solved – Understanding over-dispersion as it relates to the Poisson and the Neg. Binomial

This answer comes in two parts. The first addresses the issue of the standard errors and why that implies the models are not identical, as @ndoogan observed in comments, and the second addresses, partially, the issue of when the coefficient estimates might be substantially different.

Consider, for example, a hypothesis test on the coefficient of log(Vmaj) where the null is that the coefficient equals 0.5. The two sets of model estimates would result in rejection of the null in the Poisson case, unless testing at a very high confidence level, and failure to reject the null in the NB case.

More generally, there is more to a collection of estimates than just the point estimates. In the presence of (Negative binomial-like) overdispersion, the standard errors produced by the Negative Binomial-based model will be more accurate estimates of the underlying standard deviation of the coefficient estimates. Thus, the NB model as a whole is more accurate.

For example, a simple model with $\mu_i = \exp\{1 + x_i\}$, and $y_i \sim NB$ with mean $\mu_i$ and variance $5\mu_i$:

N <- 1000
mu <- exp(1 + (x <- rnorm(N)))

p <- 0.2
r <- mu * p / (1-p)
y <- rnbinom(N, r, p)

poisson_summary = summary(glm(y~x, family="poisson"))
nb_summary = summary(glm.nb(y~x))

# Parametric bootstrap calculation of the s.e. of the coefficient of x

coef_x <- rep(0,1000)
for (i in 1:length(coef_x)) {
   mu <- exp(1 + (x <- rnorm(N)))
   r <- mu * p / (1-p)
   y <- rnbinom(N, r, p)
   coef_x[i] <- summary(glm(y~x, family="poisson"))$coefficients["x","Estimate"]
}
data.frame("Sim. SE" = sd(coef_x - 1), 
           "Poisson SE" = poisson_summary$coefficients["x", "Std. Error"],
       "N.B. SE" = nb_summary$coefficients["x", "Std. Error"])

     Sim..SE Poisson.SE    N.B..SE
1 0.03492467 0.01273354 0.03984743

where you can see that the simulated std. deviation of the coefficient estimate is roughly 3x the Poisson-based estimate of same, and much better estimated by the NB-based estimate of same.

The coefficient estimates are pretty much the same, as one might expect with this sample size and the std. errors above, so I won't take up space by displaying them.

Additionally, although this is often a minor effect when overdispersion is low, by weighting the observations with more accurate weights (i.e., better estimates of observation-specific variances), the parameter estimates themselves will be more accurate, well, asymptotically at any rate. The rule of thumb I learned long ago was that heteroskedasticity adjustments (for that is what they are, in essence) don't buy you much unless the differences between weights are on the order of 5x or more.

Note, however, that in small samples you may well get more accurate (in MSE terms) point estimates with the Poisson model if there isn't much overdispersion, because you are reducing the variability induced by estimating the dispersion parameter. Of course, this is almost certainly more than offset by the loss of accuracy in the standard errors of the coefficient estimates.

Solved – Identical coefficients estimated in Poisson vs Quasi-Poisson model

This is almost a duplicate; the linked question explains that you shouldn't expect the coefficient estimates, residual deviance, nor degrees of freedom to change. The only thing that changes when moving from Poisson to quasi-Poisson is that a scale parameter that was previously fixed to 1 is computed from some estimate of residual variability/badness-of-fit (usually estimated via the sum of squares of the Pearson residuals ($\chi^2$) divided by the residual df, although asymptotically using the residual deviance gives the same result). The result is that the standard errors are scaled by the square root of this scale parameter, with concomitant changes in the confidence intervals and $p$-values.

The benefit of quasi-likelihood is that it fixes the basic fallacy of assuming that the data are Poisson (= homogeneous, independent counts); however, fixing the problem in this way potentially masks other issues with the data. (See below.) Quasi-likelihood is one way of handling overdispersion; if you don't address overdispersion in some way, your coefficients will be reasonable but your inference (CIs, $p$-values, etc.) will be garbage.

As you comment above, there are lots of different approaches to overdispersion (Tweedie, different negative binomial parameterizations, quasi-likelihood, zero-inflation/alteration).
With an overdispersion factor of >5 (8.4), I would worry a bit about whether it is being driven by some kind of model mis-fit (outliers, zero-inflation [which I see you've already tried], nonlinearity) rather than representing across-the-board heterogeneity. My general approach to this is graphical exploration of the raw data and regression diagnostics ...

Best Answer

Related Solutions

Solved – Understanding over-dispersion as it relates to the Poisson and the Neg. Binomial

Solved – Identical coefficients estimated in Poisson vs Quasi-Poisson model

Related Question