Solved – Understanding over-dispersion as it relates to the Poisson and the Neg. Binomial

negative-binomial-distributionoverdispersionpoisson distributionregression coefficients

I am developing a Poisson-family glm model in R for a dataset that I have. This dataset has 650 entries with two measures of exposure. The model, though not that relevant to the question, is:
$$\ln(E(y_i)) = \ln(\beta_1) + \beta_2 \ln(\text{exp}_1) + \beta_3 \ln(\text{exp}_2)$$
After developing the model in R using a Poisson Distribution, the summary() function yields:

Call:
glm(formula = Y ~ log(Vmaj) + log(Vmin), family = poisson(link = "log"),
    data = data.raw)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
    -8.3003  -2.1364  -0.2094   1.8477   5.9722  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
    (Intercept) -6.66172    0.24970  -26.68   <2e-16 ***
    log(Vmaj)    0.55473    0.02132   26.02   <2e-16 ***
    log(Vmin)    0.46638    0.01297   35.96   <2e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 7063.6  on 649  degrees of freedom
Residual deviance: 4757.6  on 647  degrees of freedom

Running the same model, but using a negative binomial distribution yields:

Call:    
glm.nb(formula = Y ~ log(Vmaj) + log(Vmin), data = data.raw, 
       init.theta = 5.642678818, link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.3076  -0.8128  -0.0806   0.6368   2.1555  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -6.63440    0.64881 -10.225   <2e-16 ***
log(Vmaj)    0.55507    0.05599   9.914   <2e-16 ***
log(Vmin)    0.46321    0.03258  14.217   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(5.6427) family taken to be 1)

    Null deviance: 983.84  on 649  degrees of freedom
Residual deviance: 678.02  on 647  degrees of freedom
AIC: 5473.7

Number of Fisher Scoring iterations: 1

          Theta:  5.643 
      Std. Err.:  0.362 

 2 x log-likelihood:  -5465.748 

One problem that I keep being told about with using a Poisson based distribution is over-dispersion. I understand that a measure of over-dispersion can be found by dividing the Residual (Scaled) deviance by the degrees of freedom. This measure would make the negative binomial look substantial better than the Poisson based distribution for dealing with over-dispersion.
I understand that over-dispersion, from the POV of the Poisson, is fundamentally related to the way in which it assumes that the mean = variance. Consequently, I understand that the Negative binomial should have a larger variance, and so the standard error terms should be higher and the z-statistic lower.

What I do not understand is why the models have essentially identical coefficients, and are, for all intents and purposes identical. My understanding of the process is that the Negative Binomial is a better model, which to me means more accurate. However, when I look at these results I do not understand how using the Negative Binomial has improved the situation, despite the fact that cursory examination based on this shows the Negative Binomial has better addressed the problem over-dispersion.

Best Answer

This answer comes in two parts. The first addresses the issue of the standard errors and why that implies the models are not identical, as @ndoogan observed in comments, and the second addresses, partially, the issue of when the coefficient estimates might be substantially different.

Consider, for example, a hypothesis test on the coefficient of log(Vmaj) where the null is that the coefficient equals 0.5. The two sets of model estimates would result in rejection of the null in the Poisson case, unless testing at a very high confidence level, and failure to reject the null in the NB case.

More generally, there is more to a collection of estimates than just the point estimates. In the presence of (Negative binomial-like) overdispersion, the standard errors produced by the Negative Binomial-based model will be more accurate estimates of the underlying standard deviation of the coefficient estimates. Thus, the NB model as a whole is more accurate.

For example, a simple model with $\mu_i = \exp\{1 + x_i\}$, and $y_i \sim NB$ with mean $\mu_i$ and variance $5\mu_i$:

N <- 1000
mu <- exp(1 + (x <- rnorm(N)))

p <- 0.2
r <- mu * p / (1-p)
y <- rnbinom(N, r, p)

poisson_summary = summary(glm(y~x, family="poisson"))
nb_summary = summary(glm.nb(y~x))

# Parametric bootstrap calculation of the s.e. of the coefficient of x

coef_x <- rep(0,1000)
for (i in 1:length(coef_x)) {
   mu <- exp(1 + (x <- rnorm(N)))
   r <- mu * p / (1-p)
   y <- rnbinom(N, r, p)
   coef_x[i] <- summary(glm(y~x, family="poisson"))$coefficients["x","Estimate"]
}
data.frame("Sim. SE" = sd(coef_x - 1), 
           "Poisson SE" = poisson_summary$coefficients["x", "Std. Error"],
       "N.B. SE" = nb_summary$coefficients["x", "Std. Error"])

     Sim..SE Poisson.SE    N.B..SE
1 0.03492467 0.01273354 0.03984743

where you can see that the simulated std. deviation of the coefficient estimate is roughly 3x the Poisson-based estimate of same, and much better estimated by the NB-based estimate of same.

The coefficient estimates are pretty much the same, as one might expect with this sample size and the std. errors above, so I won't take up space by displaying them.

Additionally, although this is often a minor effect when overdispersion is low, by weighting the observations with more accurate weights (i.e., better estimates of observation-specific variances), the parameter estimates themselves will be more accurate, well, asymptotically at any rate. The rule of thumb I learned long ago was that heteroskedasticity adjustments (for that is what they are, in essence) don't buy you much unless the differences between weights are on the order of 5x or more.

Note, however, that in small samples you may well get more accurate (in MSE terms) point estimates with the Poisson model if there isn't much overdispersion, because you are reducing the variability induced by estimating the dispersion parameter. Of course, this is almost certainly more than offset by the loss of accuracy in the standard errors of the coefficient estimates.

Related Question