Negative Binomial Distribution – Is Negative Binomial Regression a Poor Model?

modelingnegative-binomial-distributionregression

I am reading a very interesting article by Sellers and Shmueli on regression models for count data. Near the beginning (p. 944) they cite McCullaugh and Nelder (1989) saying that negative binomial regression is unpopular and has a problematic canonical link. I found the referred passage and it says (p. 374 of M and N)

"Little use seems to have been made of the negative binomial distribution in applications; in particular, the use of the canonical link is problematical because it makes the linear predictor a function of a parameter of the variance function".

On the previous page they give that link function as

$$\eta = \log\left(\frac{\alpha}{1 + \alpha} \right) = \log\left( \frac{\mu}{\mu + k}\right)$$

and variance function

$$V = \mu + \frac{\mu^2}{k}.$$

The distribution is given as

$$Pr(Y = y; \alpha,k) = \frac{(y+k-1)!}{y!(k-1)!}\frac{\alpha^y}{(1+\alpha)^{y=k}}$$

I have found NB regression to be quite widely used (and recommended in several books). Are all these uses and recommendations in error?

What are the consequences of this problematic link?

Best Answer

I dispute the assertions from several points of view:

i) While the canonical link may well be 'problematic', it's not immediately obvious that someone will be interested in that link - whereas, for example, the log-link in the Poisson is often both convenient and natural, and so people are often interested in that. Even so, in the Poisson case people do look at other link functions.

So we needn't restrict our consideration to the canonical link.

A 'problematic link' is not of itself a especially telling argument against negative binomial regression.

The log-link, for example, seems to be quite a reasonable choice in some negative binomial applications, for example, in the cases where the data might be conditionally Poisson but there's heterogeneity in the Poisson rate - the log link can be almost as interpretable as it is in the Poisson case.

By comparison, I use Gamma GLMs reasonably often, but I don't recall (textbook examples aside) ever having used its canonical link - I use the log-link almost always, since it's a more natural link to use for the kinds of problems I tend to work with.

ii) "Little seems to have been made ... in applications" may have been just about true in 1989, but I don't think it stands now. [Even if it did stand now, that's not an argument that it's a poor model, only that it's not been widely used - which might happen for all manner of reasons.]

Negative binomial regression has become more widely used as it's more widely available, and I see it used in applications much more widely now. In R, for example, I make use of the functions in MASS that support it (and the corresponding book, Venables and Ripley's, Modern Applied Statistics with S, uses negative binomial regression in some interesting applications) -- and I've used some functionality in a few other packages even before I used it in R.

I would have used negative binomial regression more, even earlier, if it had been readily available to me; I expect the same is true of many people - so the argument that it was little used seems to be more one of opportunity.

While it's possible to avoid negative binomial regression, (say by using overdispersed Poisson models), or a number of situations where it really doesn't matter much what you do, there are various reasons why that's not entirely satisfactory.

For example, when my interest is more toward prediction intervals than estimates of coefficients, the fact that the coefficients don't change may not be an adequate reason to avoid the negative binomial.

Of course there are still other choices that model the dispersion (such as the Conway-Maxwell-Poisson that is the subject of the paper you mentioned); while those are certainly options, there are sometimes situations where I am quite happy that the negative binomial is a reasonably good 'fit' as a model for my problem.

Are all these uses and recommendations in error?

I really don't think so! If they were, it should have become reasonably clear by now. Indeed, if McCullagh and Nelder had continued to feel the same way, they had no lack of opportunity, nor any lack of forums in which to clarify the remaining issues. Nelder has passed away (2010), but McCullagh is apparently still around.

If that short passage in McCullagh and Nelder is all they have, I'd say that's a pretty weak argument.

What are the consequences of this problematic link?

I think the issue is mainly one of the variance function and the link function being related rather than unrelated (as is the case for pretty much all the other main GLM families in popular use), which makes the interpretation on the scale of the linear predictor less straightforward (that's not to say it's the only issue; I do think it's the main issue for a practitioner). It's not much of a deal.


By way of comparison, I see Tweedie models being used much more widely in recent times, and I don't see people concerning themselves with the fact that $p$ appears both in the variance function and the canonical link (nor in most cases even worrying much about the canonical link).

None of this is to take anything away from Conway-Maxwell-Poisson models (the subject of the Sellers and Shmueli paper), which are also becoming more widely used -- I certainly don't wish to take part in a negative binomial vs COM-Poisson shooting match.

I simply don't see it as one-or-the-other, any more than (now speaking more widely) I take a purely Bayesian nor purely frequentist stance on statistical problems. I'll use whatever strikes me as the best choice in the particular circumstances I am in, and each choice tends to have advantages and disadvantages.

Related Question