Solved – Theta going towards infinity in negative binomial model

generalized linear modelmaximum likelihoodnegative-binomial-distributionr

Background: I am fitting a negative binomial model repeatedly after doing multiple imputation. As a result the data for each model fit changes a little bit.

Finding: On some of the sets of data I analyze the $\theta$ parameter (=1/dispersion=$1/\kappa$) used in the parameterization in the glm.nb function in the R package MASS seems to "diverge" to infinity. It seems like glm.nb has no totally good way of coping with this. I.e. warns me ("In glm.nb(AVAL ~ factor(trtp, levels = c(0, 1, 2)) + factor(stratnum) + … : alternation limit reached") that it has reached its alternation limit – meaning how often it switches between updating $\hat{\theta}$ and fitting the GLM for a fixed $\theta$ -, if you leave the maximum number of iterations as the default of 25. Or it stops with an error ("Error in while ((it <- it + 1) < limit && abs(del) > eps) { : missing value where TRUE/FALSE needed") due to numerical issues due to $\theta$ not being finite (if I interpret the warning correctly), if I increase the maximum number of alternations.

What other software I know does: In comparison PROC GENMOD in SAS parameterizes the model in terms of $\kappa$ so that the limit of the NegBin model becoming a Poisson model is reached as $\kappa \to 0$. When the estimate of $\kappa$ goes off in that direction, it at least stays finite and can actually "hit" the boundary of the parameter space. When that happens and $\hat{\kappa}$ stays there in the next GLM fit, SAS reports the results as if $\kappa$ were known to be zero (i.e. the results from a Poisson model). We can debate whether that fully reflects the uncertainty about $\kappa$, but there is an output that is relatively sensible (and kind of makes sense in that it would be essentially the same if we had estimated kappa to be very, very small).

What I am wondering: Is there a method of getting a similar behaviour for glm.nb? A check of whether nbfit\$theta>10000 seems a little bit crude to me, but seems to work on my particular set of data. However, I assume someone else must have run into this before and have identified a good way of handling it. I also noted that for the sample code below the coefficients, standard errors etc. from the Poisson model and the negative binomial model (that has $\theta$ int he process of running off to infinity) give the exact (at least to the numerical precision R gives me) same results. So perhaps I should just stick to the default maxit=25 and the results are sensible?!

Example code: If you want to play with this issue yourself, here is some R code that illustrates it (at least with R version 3.3.3 (2017-03-06) and ‘MASS’ version 7.3-47):

require(MASS)

set.seed(12345)
y <- rpois(n=500, lambda=0.5)
group <- factor(rep(1:2,each=250))

nbfit0 <- glm.nb(y~group)
nbfit0$theta

nbfit1 <- glm.nb(y~group, control=glm.control(maxit=30))
nbfit1$theta

nbfit2 <- glm.nb(y~group, control=glm.control(maxit=2500))

pfit <- glm(y ~ group, family=poisson())

summary(pfit)$coefficients
summary(nbfit1)$coefficients

Best Answer

Your problem is caused by the fact that several estimators, including the method of moments and maximum likelihood estimators, don't exist if the data isn't "dispersed" enough relative to the mean, and glm.nb is a maximum likelihood estimator (but it starts out with a method of moments initial guess if you don't specify a value.) In the case of underdispersed data, the fitting routine will try to get as close to a Poisson distribution as possible, which involves driving theta to infinity, as you are aware.

In your example, you generate the sample data from a Poisson distribution, which is underdispersed relative to the negative binomial distribution. Sometimes the randomly-generated data will be variable enough for glm.nb to find a solution, other times it won't be. See what happens with the following code, which changes the random number generator to a negative binomial one with the same mean $(0.5)$ but variance $1$:

> y <- rnbinom(n=500, size=1, prob=0.667)
> group <- factor(rep(1:2,each=250))
> 
> nbfit <- glm.nb(y~group)
> nbfit$theta
[1] 1.105539
>
> pfit <- glm(y ~ group, family=poisson())
> 
> summary(pfit)$coefficients
              Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -0.8301130 0.09578262 -8.666635 4.450788e-18
group2       0.2922587 0.12658446  2.308804 2.095445e-02
> summary(nbfit)$coefficients
              Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -0.8301130  0.1131037 -7.339395 2.145620e-13
group2       0.2922587  0.1525120  1.916300 5.532697e-02

The true value of theta is equal to size in the rnbinom call, or $1$. The coefficient estimates are, as expected, the same, and the standard errors of the NB estimates are larger than those of the Poisson estimates, also as expected.

If your data really is quite often underdispersed relative to a negative binomial distribution, you might want to just stick with the Poisson. Otherwise, you can construct a (better) version of the following function that first checks the return from glm.nb and, if there is a failure to converge to theta, will just run a Poisson GLM:

foo <- function() {
  m0 <- glm.nb(y~group)
  if (m0$th.warn == "iteration limit reached") {
    m0 <- glm(y~group, family=poisson)
  }
  m0
}

which works as follows:

> y <- rpois(500, 0.5)
> vfit <- foo()
Warning messages:
1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace >  :
  iteration limit reached
2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace >  :
  iteration limit reached
> summary(vfit)$coefficients
              Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -0.8675006 0.09758441 -8.889746 6.124890e-19
group2       0.2586945 0.12990510  1.991412 4.643564e-02