Negative Binomial Distribution – Distribution Describing the Difference Between Negative Binomial Distributed Variables

distributionsmodelingnegative-binomial-distributionpoisson distributionskellam-distribution

A Skellam Distribution describes the difference between two variables that have Poisson distributions. Is there a similar distribution that describes the difference between variables that follow negative binomial distributions?

My data is produced by a Poisson process, but includes a fair amount of noise, leading to overdispersion in the distribution. Thus, modeling the data with a negative binomial (NB) distribution works well. If I want to model the difference between two of these NB data sets, what are my options? If it helps, assume similar means and variance for the two sets.

Best Answer

I don't know the name of this distribution but you can just derive it from the law of total probability. Suppose $X, Y$ each have negative binomial distributions with parameters $(r_{1}, p_{1})$ and $(r_{2}, p_{2})$, respectively. I'm using the parameterization where $X,Y$ represent the number of successes before the $r_{1}$'th, and $r_{2}$'th failures, respectively. Then,

$$ P(X - Y = k) = E_{Y} \Big( P(X-Y = k) \Big) = E_{Y} \Big( P(X = k+Y) \Big) = \sum_{y=0}^{\infty} P(Y=y)P(X = k+y) $$

We know

$$ P(X = k + y) = {k+y+r_{1}-1 \choose k+y} (1-p_{1})^{r_{1}} p_{1}^{k+y} $$

and

$$ P(Y = y) = {y+r_{2}-1 \choose y} (1-p_{2})^{r_{2}} p_{2}^{y} $$

so

$$ P(X-Y=k) = \sum_{y=0}^{\infty} {y+r_{2}-1 \choose y} (1-p_{2})^{r_{2}} p_{2}^{y} \cdot {k+y+r_{1}-1 \choose k+y} (1-p_{1})^{r_{1}} p_{1}^{k+y} $$

That's not pretty (yikes!). The only simplification I see right off is

$$ p_{1}^{k} (1-p_{1})^{r_{1}} (1-p_{2})^{r_{2}} \sum_{y=0}^{\infty} (p_{1}p_{2})^{y} {y+r_{2}-1 \choose y} {k+y+r_{1}-1 \choose k+y} $$

which is still pretty ugly. I'm not sure if this is helpful but this can also be re-written as

$$ \frac{ p_{1}^{k} (1-p_{1})^{r_{1}} (1-p_{2})^{r_{2}} }{ (r_{1}-1)! (r_{2}-1)! } \sum_{y=0}^{\infty} (p_{1}p_{2})^{y} \frac{ (y+r_{2}-1)! (k+y+r_{1}-1)! }{y! (k+y)! } $$

I'm not sure if there is a simplified expression for this sum but it could be approximated numerically if you only need it to calculate $p$-values

I verified with simulation that the above calculation is correct. Here is a crude R function to calculate this mass function and carry out a few simulations

  f = function(k,r1,r2,p1,p2,UB)  
  {

  S=0
  const = (p1^k) * ((1-p1)^r1) * ((1-p2)^r2)
  const = const/( factorial(r1-1) * factorial(r2-1) ) 

  for(y in 0:UB)
  {
     iy = ((p1*p2)^y) * factorial(y+r2-1)*factorial(k+y+r1-1)
     iy = iy/( factorial(y)*factorial(y+k) )
     S = S + iy
  }

  return(S*const)
  }

 ### Sims
 r1 = 6; r2 = 4; 
 p1 = .7; p2 = .53; 
 X = rnbinom(1e5,r1,p1)
 Y = rnbinom(1e5,r2,p2)
 mean( (X-Y) == 2 ) 
 [1] 0.08508
 f(2,r1,r2,1-p1,1-p2,20)
 [1] 0.08509068
 mean( (X-Y) == 1 ) 
 [1] 0.11581
 f(1,r1,r2,1-p1,1-p2,20)
 [1] 0.1162279
 mean( (X-Y) == 0 ) 
 [1] 0.13888
 f(0,r1,r2,1-p1,1-p2,20)
 [1] 0.1363209

I've found the sum converges very quickly for all of the values I tried, so setting UB higher than 10 or so is not necessary. Note that R's built in rnbinom function parameterizes the negative binomial in terms of the number of failures before the $r$'th success, in which case you'd need to replace all of the $p_{1}, p_{2}$'s in the above formulas with $1-p_{1}, 1-p_{2}$ for compatibility.