Solved – Understanding over-dispersion as it relates to the Poisson and the Neg. Binomial

negative-binomial-distributionoverdispersionpoisson distributionregression coefficients

I am developing a Poisson-family glm model in R for a dataset that I have. This dataset has 650 entries with two measures of exposure. The model, though not that relevant to the question, is:
$$\ln(E(y_i)) = \ln(\beta_1) + \beta_2 \ln(\text{exp}_1) + \beta_3 \ln(\text{exp}_2)$$
After developing the model in R using a Poisson Distribution, the summary() function yields:

Call:
glm(formula = Y ~ log(Vmaj) + log(Vmin), family = poisson(link = "log"),
    data = data.raw)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
    -8.3003  -2.1364  -0.2094   1.8477   5.9722  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
    (Intercept) -6.66172    0.24970  -26.68   <2e-16 ***
    log(Vmaj)    0.55473    0.02132   26.02   <2e-16 ***
    log(Vmin)    0.46638    0.01297   35.96   <2e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 7063.6  on 649  degrees of freedom
Residual deviance: 4757.6  on 647  degrees of freedom

Running the same model, but using a negative binomial distribution yields:

Call:    
glm.nb(formula = Y ~ log(Vmaj) + log(Vmin), data = data.raw, 
       init.theta = 5.642678818, link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.3076  -0.8128  -0.0806   0.6368   2.1555  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -6.63440    0.64881 -10.225   <2e-16 ***
log(Vmaj)    0.55507    0.05599   9.914   <2e-16 ***
log(Vmin)    0.46321    0.03258  14.217   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(5.6427) family taken to be 1)

    Null deviance: 983.84  on 649  degrees of freedom
Residual deviance: 678.02  on 647  degrees of freedom
AIC: 5473.7

Number of Fisher Scoring iterations: 1

          Theta:  5.643 
      Std. Err.:  0.362 

 2 x log-likelihood:  -5465.748

One problem that I keep being told about with using a Poisson based distribution is over-dispersion. I understand that a measure of over-dispersion can be found by dividing the Residual (Scaled) deviance by the degrees of freedom. This measure would make the negative binomial look substantial better than the Poisson based distribution for dealing with over-dispersion.
I understand that over-dispersion, from the POV of the Poisson, is fundamentally related to the way in which it assumes that the mean = variance. Consequently, I understand that the Negative binomial should have a larger variance, and so the standard error terms should be higher and the z-statistic lower.

What I do not understand is why the models have essentially identical coefficients, and are, for all intents and purposes identical. My understanding of the process is that the Negative Binomial is a better model, which to me means more accurate. However, when I look at these results I do not understand how using the Negative Binomial has improved the situation, despite the fact that cursory examination based on this shows the Negative Binomial has better addressed the problem over-dispersion.

Best Answer

This answer comes in two parts. The first addresses the issue of the standard errors and why that implies the models are not identical, as @ndoogan observed in comments, and the second addresses, partially, the issue of when the coefficient estimates might be substantially different.

Consider, for example, a hypothesis test on the coefficient of log(Vmaj) where the null is that the coefficient equals 0.5. The two sets of model estimates would result in rejection of the null in the Poisson case, unless testing at a very high confidence level, and failure to reject the null in the NB case.

More generally, there is more to a collection of estimates than just the point estimates. In the presence of (Negative binomial-like) overdispersion, the standard errors produced by the Negative Binomial-based model will be more accurate estimates of the underlying standard deviation of the coefficient estimates. Thus, the NB model as a whole is more accurate.

For example, a simple model with $\mu_i = \exp\{1 + x_i\}$, and $y_i \sim NB$ with mean $\mu_i$ and variance $5\mu_i$:

N <- 1000
mu <- exp(1 + (x <- rnorm(N)))

p <- 0.2
r <- mu * p / (1-p)
y <- rnbinom(N, r, p)

poisson_summary = summary(glm(y~x, family="poisson"))
nb_summary = summary(glm.nb(y~x))

# Parametric bootstrap calculation of the s.e. of the coefficient of x

coef_x <- rep(0,1000)
for (i in 1:length(coef_x)) {
   mu <- exp(1 + (x <- rnorm(N)))
   r <- mu * p / (1-p)
   y <- rnbinom(N, r, p)
   coef_x[i] <- summary(glm(y~x, family="poisson"))$coefficients["x","Estimate"]
}
data.frame("Sim. SE" = sd(coef_x - 1), 
           "Poisson SE" = poisson_summary$coefficients["x", "Std. Error"],
       "N.B. SE" = nb_summary$coefficients["x", "Std. Error"])

     Sim..SE Poisson.SE    N.B..SE
1 0.03492467 0.01273354 0.03984743

where you can see that the simulated std. deviation of the coefficient estimate is roughly 3x the Poisson-based estimate of same, and much better estimated by the NB-based estimate of same.

The coefficient estimates are pretty much the same, as one might expect with this sample size and the std. errors above, so I won't take up space by displaying them.

Additionally, although this is often a minor effect when overdispersion is low, by weighting the observations with more accurate weights (i.e., better estimates of observation-specific variances), the parameter estimates themselves will be more accurate, well, asymptotically at any rate. The rule of thumb I learned long ago was that heteroskedasticity adjustments (for that is what they are, in essence) don't buy you much unless the differences between weights are on the order of 5x or more.

Note, however, that in small samples you may well get more accurate (in MSE terms) point estimates with the Poisson model if there isn't much overdispersion, because you are reducing the variability induced by estimating the dispersion parameter. Of course, this is almost certainly more than offset by the loss of accuracy in the standard errors of the coefficient estimates.

Related Solutions

Solved – Compare poisson and negative binomial regression with LR test

The Poisson and negative binomial (NB) model are nested: Poisson is a special case with theta = infinity. So a likelihood ratio test comparing the two models is testing the null hypothesis that "theta = infinity" against the alternative that "theta < infinity". Here the two models have the following log-likelihoods

R> logLik(m3)
'log Lik.' -1328.642 (df=4)
R> logLik(m1)
'log Lik.' -865.6289 (df=5)

Thus, the fitted log-likelihood in the NB model is much larger/better using just one additional parameter (4 regression coefficients plus 1 theta). The value of m1$theta is 1.032713, i.e., essentially corresponding to a geometric distribution (theta = 1). So clearly there is overdispersion compared to the Poisson distribution with a theta much lower than infinity. (My personal rule of thumb is that everything beyond 10 is already rather close to infinity.)

The likelihood ratio test statistic is then twice the absolute difference of log-likelihoods (2 * (1328.642 - 865.6289) = 926.0262) and has to be compared with a chi-squared distribution with degrees of freedom (df) equal to the difference in df's between the two models (5 - 4 = 1). This is what the code above does. However, it may be somewhat confusing that the difference of logLik() objects retains the "logLik" class and hence displays it with a label and df. Applying as.numeric() to drop the class might clarify this:

R> 2 * (logLik(m1) - logLik(m3))
'log Lik.' 926.0272 (df=5)
R> as.numeric(2 * (logLik(m1) - logLik(m3)))
[1] 926.0272

And then you could compute the p-value by hand:

R> stat <- as.numeric(2 * (logLik(m1) - logLik(m3)))
R> pchisq(stat, df = 5 - 4, lower.tail = FALSE)
[1] 2.157298e-203

There are also functions that carry out such likelihood ratio tests in a generic way, e.g., lrtest() in lmtest (among others):

R> library("lmtest")
R> lrtest(m3, m1)
Likelihood ratio test

Model 1: daysabs ~ math + prog
Model 2: daysabs ~ math + prog
  #Df   LogLik Df  Chisq Pr(>Chisq)    
1   4 -1328.64                         
2   5  -865.63  1 926.03  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As a final detail: The likelihood ratio test of Poisson vs. NB is non-standard because the parameter theta = infinity is on the boundary of the parameter space. Therefore, the p-value from the chi-squared distribution has to be halved (similar to a one-sided test). Of course, if you already have a significant result without halving, it's still significant afterwards...

Solved – Negative Binomial dispersion parameter in SPSS

Yes you have the form right, specifically the variance for the $j$th observation is:

$$\text{Var}(x_j) = \mu_j(1 + au_j)$$

Stata and SPSS have the same defaults, so I will refer to Stata's documentation (see page 5) for a nice walkthrough. This is the NB2 model from Cameron and Trivedi (which I have not read, but is cited as such in the Long and Freese Regression Models for Categorical Dependent Variables Using Stata).

It can be tricky when reading texts. Sometimes people refer to the dispersion as the entire thing in the parentheses (which changes per the mean estimate of the variable), but other times they only refer to $a$ (which is what the software reports). I've written a blog post translating between a few of the regularly used notations for negative binomial parameterizations.

Best Answer

Related Solutions

Solved – Compare poisson and negative binomial regression with LR test

Solved – Negative Binomial dispersion parameter in SPSS

Related Question