Solved – Choosing the optimal theta / dispersion parameter for negative binomial regression (glm / glm.nb) in R

dispersionnegative-binomial-distributionoverdispersionrregression

I am applying a negative binomial regression to my data in R. For this, I use the package MASS and have two different ways to calculate it:

library(MASS)

glm1 <- glm.nb(y ~ x , data=dataset)
summary(glm1)

glm2 <- glm(y ~ x , data=dataset, family=negative.binomial(1.1685))
summary(glm2)

glm.nb is choosing some sort of optimal theta to get a dispersion parameter of 1.
In contrast, glm needs a given theta.

glm1:

Call:
glm.nb(formula = y ~ x, data = dataset, init.theta = 1.168548383, 
    link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.7782  -0.7782  -0.7629   0.7530   8.5151  

Coefficients:
             Estimate Std. Error  z value Pr(>|z|)    
(Intercept) -1.062372   0.003245 -327.434   <2e-16 ***
xY          -0.044856   0.004725   -9.493   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(1.1685) family taken to be 1)

    Null deviance: 510730  on 685096  degrees of freedom
Residual deviance: 510640  on 685095  degrees of freedom
AIC: 1036465

Number of Fisher Scoring iterations: 1


              Theta:  1.1685 
          Std. Err.:  0.0110 

 2 x log-likelihood:  -1036459.4450

glm2:

Call:
glm(formula = y ~ x, family = negative.binomial(1.1685), data = dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.7782  -0.7782  -0.7629   0.7529   8.5150  

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept) -1.062372   0.003405 -311.976   <2e-16 ***
xY          -0.044856   0.004959   -9.045   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(1.1685) family taken to be 1.101544)

    Null deviance: 510725  on 685096  degrees of freedom
Residual deviance: 510635  on 685095  degrees of freedom
AIC: 1036463

Number of Fisher Scoring iterations: 4

I took the optimal theta from glm1 into glm2. As you can see, I get different values for the dispersion parameter (1 and 1.101544).
As I learned from this post, the difference is because glm.nb is showing the assumed dispersion parameter of 1 and glm is calculating the exact parameter.

At first, I thought the optimization process is not getting closer to 1. But, I tried some values manually and I could find a value for theta that will give me a real dispersion parameter very close to 1.

glm3 <- glm(y ~ x , data=dataset, family=negative.binomial(0.8043))
summary(glm3)

glm3:

Call:
glm(formula = y ~ x, family = negative.binomial(0.8043), data = dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.7583  -0.7583  -0.7441   0.7065   7.7387  

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept) -1.062372   0.003408 -311.710   <2e-16 ***
xY          -0.044856   0.004959   -9.045   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(0.8043) family taken to be 1.00005)

    Null deviance: 467800  on 685096  degrees of freedom
Residual deviance: 467718  on 685095  degrees of freedom
AIC: 1038364

Number of Fisher Scoring iterations: 4

So what value of theta is optimal? 0.8043 or 1.1685? Whats is my misunderstanding?
In case 1.1685 is my optimal theta, what does it mean that the dispersion parameter is 1.101544 and not 1?

Best Answer

The model-fitting of a negative binomial regression is achieved by maximum likelihood. That is why theta=1.1685 is the best choice in your case.

If the data fits perfectly your negative binomial distribution than you would also achieve a dispersion parameter of 1 while fitting by maximum likelihood.