Negative Binomial Distribution – Comparing with Binomial Distribution

binomial distributioncategorical datadata miningnegative-binomial-distribution

What is the difference between the negative binomial distribution and the binomial distribution?

I tried reading online, and I found that the negative binomial distribution is used when data points are discrete, but I think even the binomial distribution can be used for discrete data points.

Best Answer

The difference is what we are interested in. Both distributions are built from independent Bernoulli trials with fixed probability of success, p.

With the Binomial distribution, the random variable X is the number of successes observed in n trials. Because there are a fixed number of trials, the possible values of X are 0, 1, ..., n.

With the Negative Binomial distribution, the random variable Y is the number of trials until observed the r th success is observed. In this case, we keep increasing the number of trials until we reach r successes. The possible values of Y are r, r+1, r+2, ... with no upper bound. The Negative Binomial can also be defined in terms of the number of failures until the r th success, instead of the number of trials until the r th success. Wikipedia defines the Negative Binomial distribution in this manner.

So to summarize:

Binomial:

Fixed number of trials (n)
Fixed probability of success (p)
Random variable is X = Number of successes.
Possible values are 0 ≤ X ≤ n

Negative Binomial:

Fixed number of successes (r)
Fixed probability of success (p)
Random variable is Y = Number of trials until the r th success.
Possible values are r ≤ Y

Thanks to Ben Bolker for reminding me to mention the support of the two distributions. He answered a related question here.

Related Solutions

Negative Binomial Distribution – Distribution Describing the Difference Between Negative Binomial Distributed Variables

I don't know the name of this distribution but you can just derive it from the law of total probability. Suppose $X, Y$ each have negative binomial distributions with parameters $(r_{1}, p_{1})$ and $(r_{2}, p_{2})$, respectively. I'm using the parameterization where $X,Y$ represent the number of successes before the $r_{1}$'th, and $r_{2}$'th failures, respectively. Then,

$$ P(X - Y = k) = E_{Y} \Big( P(X-Y = k) \Big) = E_{Y} \Big( P(X = k+Y) \Big) = \sum_{y=0}^{\infty} P(Y=y)P(X = k+y) $$

We know

$$ P(X = k + y) = {k+y+r_{1}-1 \choose k+y} (1-p_{1})^{r_{1}} p_{1}^{k+y} $$

and

$$ P(Y = y) = {y+r_{2}-1 \choose y} (1-p_{2})^{r_{2}} p_{2}^{y} $$

$$ P(X-Y=k) = \sum_{y=0}^{\infty} {y+r_{2}-1 \choose y} (1-p_{2})^{r_{2}} p_{2}^{y} \cdot {k+y+r_{1}-1 \choose k+y} (1-p_{1})^{r_{1}} p_{1}^{k+y} $$

That's not pretty (yikes!). The only simplification I see right off is

$$ p_{1}^{k} (1-p_{1})^{r_{1}} (1-p_{2})^{r_{2}} \sum_{y=0}^{\infty} (p_{1}p_{2})^{y} {y+r_{2}-1 \choose y} {k+y+r_{1}-1 \choose k+y} $$

which is still pretty ugly. I'm not sure if this is helpful but this can also be re-written as

$$ \frac{ p_{1}^{k} (1-p_{1})^{r_{1}} (1-p_{2})^{r_{2}} }{ (r_{1}-1)! (r_{2}-1)! } \sum_{y=0}^{\infty} (p_{1}p_{2})^{y} \frac{ (y+r_{2}-1)! (k+y+r_{1}-1)! }{y! (k+y)! } $$

I'm not sure if there is a simplified expression for this sum but it could be approximated numerically if you only need it to calculate $p$-values

I verified with simulation that the above calculation is correct. Here is a crude R function to calculate this mass function and carry out a few simulations

  f = function(k,r1,r2,p1,p2,UB)  
  {

  S=0
  const = (p1^k) * ((1-p1)^r1) * ((1-p2)^r2)
  const = const/( factorial(r1-1) * factorial(r2-1) ) 

  for(y in 0:UB)
  {
     iy = ((p1*p2)^y) * factorial(y+r2-1)*factorial(k+y+r1-1)
     iy = iy/( factorial(y)*factorial(y+k) )
     S = S + iy
  }

  return(S*const)
  }

 ### Sims
 r1 = 6; r2 = 4; 
 p1 = .7; p2 = .53; 
 X = rnbinom(1e5,r1,p1)
 Y = rnbinom(1e5,r2,p2)
 mean( (X-Y) == 2 ) 
 [1] 0.08508
 f(2,r1,r2,1-p1,1-p2,20)
 [1] 0.08509068
 mean( (X-Y) == 1 ) 
 [1] 0.11581
 f(1,r1,r2,1-p1,1-p2,20)
 [1] 0.1162279
 mean( (X-Y) == 0 ) 
 [1] 0.13888
 f(0,r1,r2,1-p1,1-p2,20)
 [1] 0.1363209

I've found the sum converges very quickly for all of the values I tried, so setting UB higher than 10 or so is not necessary. Note that R's built in rnbinom function parameterizes the negative binomial in terms of the number of failures before the $r$'th success, in which case you'd need to replace all of the $p_{1}, p_{2}$'s in the above formulas with $1-p_{1}, 1-p_{2}$ for compatibility.

Solved – Fitting negative binomial distribution to large count data

Firstly, goodness of fitness tests or tests for particular distributions will typically reject the null hypothesis given a sufficiently large sample size, because we are hardly ever in the situation, where data exactly arises from a particular distribution and we did also take into account all relevant (possibly unmeasured) covariates that explain further differences between subject/units. However, in practice such deviations can be pretty irrelevant and it is well known that many models can be used, even if their are some deviations from distributional assumptions (most famously regarding the normality of residuals in regression models with normal error terms).

Secondly, a negative binomial model is a relatively logical default choice for count data (that can only be $\geq 0$). We do not have that many details though and there might be obvious features of the data (e.g. regarding how it arises) that would suggest something more sophisticated. E.g. accounting for key covariates using negative binomial regression could be considered.

Best Answer

Related Solutions

Negative Binomial Distribution – Distribution Describing the Difference Between Negative Binomial Distributed Variables

Solved – Fitting negative binomial distribution to large count data

Related Question