Likelihood Ratio Test – Comparing Poisson and Negative Binomial Regression

likelihood-rationegative-binomial-distributionpoisson distribution

My question is related to the question Compare negative binomial models. I have some difficulties understanding the UCLA guide (I am using R, too).

In the section Checking model assumption they run

pchisq(2 * (logLik(m1) - logLik(m3)), df = 1, lower.tail = FALSE)

I understand the first part, but why do they set the degrees to freedom equal to 1 (df = 1)?

Their output then is

## 'log Lik.' 2.157e-203 (df=5)

and they write

In this example the associated chi-squared value is 926.03 with one
degree of freedom. This strongly suggests the negative binomial model,
estimating the dispersion parameter, is more appropriate than the
Poisson model.

What does "926.03 with one degree of freedom" refer to?

Best Answer

The Poisson and negative binomial (NB) model are nested: Poisson is a special case with theta = infinity. So a likelihood ratio test comparing the two models is testing the null hypothesis that "theta = infinity" against the alternative that "theta < infinity". Here the two models have the following log-likelihoods

R> logLik(m3)
'log Lik.' -1328.642 (df=4)
R> logLik(m1)
'log Lik.' -865.6289 (df=5)

Thus, the fitted log-likelihood in the NB model is much larger/better using just one additional parameter (4 regression coefficients plus 1 theta). The value of m1$theta is 1.032713, i.e., essentially corresponding to a geometric distribution (theta = 1). So clearly there is overdispersion compared to the Poisson distribution with a theta much lower than infinity. (My personal rule of thumb is that everything beyond 10 is already rather close to infinity.)

The likelihood ratio test statistic is then twice the absolute difference of log-likelihoods (2 * (1328.642 - 865.6289) = 926.0262) and has to be compared with a chi-squared distribution with degrees of freedom (df) equal to the difference in df's between the two models (5 - 4 = 1). This is what the code above does. However, it may be somewhat confusing that the difference of logLik() objects retains the "logLik" class and hence displays it with a label and df. Applying as.numeric() to drop the class might clarify this:

R> 2 * (logLik(m1) - logLik(m3))
'log Lik.' 926.0272 (df=5)
R> as.numeric(2 * (logLik(m1) - logLik(m3)))
[1] 926.0272

And then you could compute the p-value by hand:

R> stat <- as.numeric(2 * (logLik(m1) - logLik(m3)))
R> pchisq(stat, df = 5 - 4, lower.tail = FALSE)
[1] 2.157298e-203

There are also functions that carry out such likelihood ratio tests in a generic way, e.g., lrtest() in lmtest (among others):

R> library("lmtest")
R> lrtest(m3, m1)
Likelihood ratio test

Model 1: daysabs ~ math + prog
Model 2: daysabs ~ math + prog
  #Df   LogLik Df  Chisq Pr(>Chisq)    
1   4 -1328.64                         
2   5  -865.63  1 926.03  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As a final detail: The likelihood ratio test of Poisson vs. NB is non-standard because the parameter theta = infinity is on the boundary of the parameter space. Therefore, the p-value from the chi-squared distribution has to be halved (similar to a one-sided test). Of course, if you already have a significant result without halving, it's still significant afterwards...

Related Solutions

Solved – Understanding over-dispersion as it relates to the Poisson and the Neg. Binomial

This answer comes in two parts. The first addresses the issue of the standard errors and why that implies the models are not identical, as @ndoogan observed in comments, and the second addresses, partially, the issue of when the coefficient estimates might be substantially different.

Consider, for example, a hypothesis test on the coefficient of log(Vmaj) where the null is that the coefficient equals 0.5. The two sets of model estimates would result in rejection of the null in the Poisson case, unless testing at a very high confidence level, and failure to reject the null in the NB case.

More generally, there is more to a collection of estimates than just the point estimates. In the presence of (Negative binomial-like) overdispersion, the standard errors produced by the Negative Binomial-based model will be more accurate estimates of the underlying standard deviation of the coefficient estimates. Thus, the NB model as a whole is more accurate.

For example, a simple model with $\mu_i = \exp\{1 + x_i\}$, and $y_i \sim NB$ with mean $\mu_i$ and variance $5\mu_i$:

N <- 1000
mu <- exp(1 + (x <- rnorm(N)))

p <- 0.2
r <- mu * p / (1-p)
y <- rnbinom(N, r, p)

poisson_summary = summary(glm(y~x, family="poisson"))
nb_summary = summary(glm.nb(y~x))

# Parametric bootstrap calculation of the s.e. of the coefficient of x

coef_x <- rep(0,1000)
for (i in 1:length(coef_x)) {
   mu <- exp(1 + (x <- rnorm(N)))
   r <- mu * p / (1-p)
   y <- rnbinom(N, r, p)
   coef_x[i] <- summary(glm(y~x, family="poisson"))$coefficients["x","Estimate"]
}
data.frame("Sim. SE" = sd(coef_x - 1), 
           "Poisson SE" = poisson_summary$coefficients["x", "Std. Error"],
       "N.B. SE" = nb_summary$coefficients["x", "Std. Error"])

     Sim..SE Poisson.SE    N.B..SE
1 0.03492467 0.01273354 0.03984743

where you can see that the simulated std. deviation of the coefficient estimate is roughly 3x the Poisson-based estimate of same, and much better estimated by the NB-based estimate of same.

The coefficient estimates are pretty much the same, as one might expect with this sample size and the std. errors above, so I won't take up space by displaying them.

Additionally, although this is often a minor effect when overdispersion is low, by weighting the observations with more accurate weights (i.e., better estimates of observation-specific variances), the parameter estimates themselves will be more accurate, well, asymptotically at any rate. The rule of thumb I learned long ago was that heteroskedasticity adjustments (for that is what they are, in essence) don't buy you much unless the differences between weights are on the order of 5x or more.

Note, however, that in small samples you may well get more accurate (in MSE terms) point estimates with the Poisson model if there isn't much overdispersion, because you are reducing the variability induced by estimating the dispersion parameter. Of course, this is almost certainly more than offset by the loss of accuracy in the standard errors of the coefficient estimates.

Solved – Why do ANOVAs and GLM (negative binomial model) give different results for interaction effects

If you only have two categorical variables and you include the interaction between the two, then the nbreg and anova are equivalent in the sense that they produce exactly the same predicted values. As a consequence both models are "equally right" or "equally wrong".

sysuse nlsw88, clear

gen byte black = race == 2 if race < 3

anova wage black##union
predict yhat_anova

nbreg wage black##union
predict yhat_nbreg

tab yhat_anova yhat_nbreg

    Fitted |         Predicted number of events
    values |  5.968046     7.5821   8.614504   8.735929 |     Total
-----------+--------------------------------------------+----------
  5.968046 |       350          0          0          0 |       350 
    7.5821 |         0      1,051          0          0 |     1,051 
  8.614504 |         0          0        151          0 |       151 
  8.735929 |         0          0          0        302 |       302 
-----------+--------------------------------------------+----------
     Total |       350      1,051        151        302 |     1,854

This is not just true for a model with two categorical variables and their interaction; It is true for any fully saturated model, i.e. any model that inlcudes only categorical variables and all interactions including all higher order interactions.

Edit:

In order to see why these models are just different ways of saying the same thing it is easiest to go back at the basics. Consider the example above: Both models model the mean outcome, in this case mean wage. In both models there are only 4 means to model: black non-union, black union, white non-union white union. Both models model these 4 means with 4 parameters (constant, main effect for union, main effect for black, and the interaction term). As a consequence both models impose no constraint on these means.

Lets start with just looking at these means:

table  union black, c(mean wage)

------------------------------
union     |       black       
worker    |        0         1
----------+-------------------
 nonunion |   7.5821  5.968046
    union | 8.735929  8.614504
------------------------------

So, among non-union member there is a noticable difference in mean wage between black and white persons, while among union members this difference is a lot smaller.

Lets look at the regression output:

. reg wage i.union##i.black

      Source |       SS       df       MS              Number of obs =    1854
-------------+------------------------------           F(  3,  1850) =   29.83
       Model |  1472.83336     3  490.944452           Prob > F      =  0.0000
    Residual |  30445.6086  1850  16.4570858           R-squared     =  0.0461
-------------+------------------------------           Adj R-squared =  0.0446
       Total |   31918.442  1853   17.225279           Root MSE      =  4.0567

------------------------------------------------------------------------------
        wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] 
-------------+----------------------------------------------------------------
       union |
      union  |   1.153829   .2648625     4.36   0.000     .6343677    1.673289
     1.black |  -1.614053   .2503572    -6.45   0.000    -2.105066   -1.123041
             |
 union#black |
    union#1  |   1.492629   .4755625     3.14   0.002     .5599329    2.425324
             |
       _cons |     7.5821   .1251339    60.59   0.000     7.336681    7.827518
------------------------------------------------------------------------------

You can see that the mean wage for non-union white persons is the constant. This mean is 1.61 dollars/hour less for black persons. So the wage for black persons is:

di _b[_cons]+_b[1.black]
5.9680463

Which corresponds with the mean reported in the tabel above. The negative effect of being black is reduced by 1.48 dollars/hour, i.e. black union members earn

di _b[1.black] + _b[1.union#1.black]
-.12142489

dollars/hour less than white union members, a number you can also recover from the table of means above.

Consider a multiplicative model, in this case a Poisson with robust standard errors (for a justification of that choice, see here.

. poisson wage i.union##i.black, irr vce(robust)
note: you are responsible for interpretation of noncount dep. variable

Iteration 0:   log pseudolikelihood = -5239.2159  
Iteration 1:   log pseudolikelihood = -5239.2159  

Poisson regression                                Number of obs   =       1854
                                                  Wald chi2(3)    =     106.56
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood = -5239.2159                 Pseudo R2       =     0.0188

------------------------------------------------------------------------------
             |               Robust
        wage |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       union |
      union  |   1.152178   .0360531     4.53   0.000     1.083638    1.225053
     1.black |   .7871232   .0265286    -7.10   0.000     .7368083     .840874
             |
 union#black |
    union#1  |   1.252791   .0759936     3.72   0.000     1.112359    1.410951
             |
       _cons |     7.5821   .1308473   117.39   0.000     7.329932    7.842942
------------------------------------------------------------------------------

Again we can see that the mean wage of white non-union members is 7.5821. Black non-union persons earn $(0.787 - 1) \times 100\% = -21\%$ less then white non-union persons, a number you can recover from the table of means: $( 5.968046 - 7.5821 ) / 7.5821*100 = -21\%$. This negative effect of being black increases, which means becomes less negative, by 25% if someone is a union member. So, the effect of being black for union members is $1.25\times.787=.98$ or black union members earn 2\% less than white union members. A number you could recover from the table of means or the regression output, both of which I will leave as an exercise to the reader.

Best Answer

Related Solutions

Solved – Understanding over-dispersion as it relates to the Poisson and the Neg. Binomial

Solved – Why do ANOVAs and GLM (negative binomial model) give different results for interaction effects

Related Question