Solved – Negative Binomial “Process”

negative-binomial-distributionpoisson distributionrregression

I wish to model the number of bugs caused by software development. This is intuitively sort of a Poisson process, however it is overdispersed. One thing we can do in this case is to use a negative binomial distribution (because negative binomial approaches Poisson as r gets larger, or because we might think the parameter $\lambda$ of the poisson is itself gamma-distributed.)

I'm not sure how to do this though. For example, we have that
$$\lim_{r\to\infty}NB\left(r,\frac{\lambda}{\lambda+r}\right)=\text{Poisson}(\lambda)$$
Given that a poisson process of duration $t$ can be modeled as $\text{Poisson}(\lambda t)$ I guess we could look at $NB\left(r,\frac{\lambda t}{\lambda t+r}\right)$ – is that correct? Given that I know $t$, it seems like I should be setting $r=t$.

At a more technical level, glm.nb from the MASS package seems to fit $r$ not the dispersion parameter and I don't see an obvious parameter to change this.

Any insight at the theoretical or technical level would be appreciated.

Best Answer

Several stochastic processes lead to marginal counts having a Negative Binomial (NB) distribution and can therefore be called NB processes. Among them, the NB Lévy Process is of special interest since increments (counts) over non-overlapping time intervals are independent, a property shared with the Poisson Process, a Gamma process and the Wiener Process. The count $N_t$ on an interval of length $t$ has the NB distribution $$ N_t \sim \textrm{NB}(r,\,p), \quad r = \gamma t $$ so the process depends on the two parameters $\gamma >0$ (with the dimension of an inverse time) and the probability $p$ ($0 < p < 1$). The expectation is proportional to the interval length, and so is its variance $$ \mathbb{E}(N_t) = \gamma t \, (1-p)/p \qquad \textrm{Var}(N_t) = \gamma t \, (1-p)/p^2. $$ The variance is greater than the mean (overdispersion), and the index of dispersion $\textrm{Var}(N_t)/\mathbb{E}(N_t) = 1/p$ does not depend on $t$. When $p$ is close to $1$ and $\gamma (1-p)$ is close to $\lambda >0$, the process behaves like a Poisson Process with rate $\lambda$. An explanation for overdispersion is that several events can happen at the same time, so a small interval can contain more than one event.

It is easy to fit such a process by Maximum Likelihood when the intervals have different lengths. In this case we face a NB regression with a link function differing from the default link in NB GLMs. A special likelihood maximisation is useful.

The article by T.J. Kozubowski and K. Podgorski provide theoretical results as well as an illustration.

Curiously enough, this process does not seem to be frequently used as such by statisticians.