[Math] Proof for the calculation of mean in negative binomial distribution

probabilitystatistics

I am trying to figure out the mean for negative binomial distribution but have run into mistakes. I know there are other posts on deriving the mean bu I am attempting to derive it in my own way. I wonder if any of you can point out where my mistake is:

In negative binomial distribution, the probability is:
$$
p(X=x) = \frac{(x-1)!}{(r-1)!(x-r)!}p^r(1-p)^{x-r},
$$
where $X$ is a random variable for the number of trials required, $x$ is the number of trials, p is the probability of success, and r is the number of success until $x$th trial. Therefore, to calculate expectation:

$$
E(x) = \sum_{x=r}^{\infty}xp(x)=x\sum_{x=r}^{\infty}\frac{(x-1)!}{(r-1)!(x-r)!}p^r(1-p)^{x-r}=\sum_{x=r}^{\infty}\frac{x!}{(r-1)!(x-r)!}p^r(1-p)^{x-r}
$$

Let $k=x-r$, then the formula becomes:
$$
E(x)=\sum_{k=0}^{\infty}\frac{(k+r)!}{(r-1)!k!}p^r(1-p)^k=
\sum_{k=0}^{\infty}\frac{(k+r)!}{(r-1)!k!}p^r(1-p)^k=
r\sum_{k=0}^{\infty}\frac{(k+r)!}{r!k!}p^r(1-p)^k
$$

By binomial theorem, $\sum_{k=0}^{\infty}\frac{(k+r)!}{r!k!}p^r(1-p)^k$ becomes $[p+(1-p)]^{k+r} = 1$, and thus $E(x) = r$, which is obviously wrong.

I cannot figure out what is wrong with my proof, and thus any help will be appreciated. For reference, someone else has done a similar proof here, but I still have trouble understanding the mistake(s) in my proof:

Deriving Mean for Negative Binomial Distribution.

Best Answer

Here is a purely algebraic approach. We begin by first showing that the PMF for a negative binomial distribution does in fact sum to $1$ over its support. Suppose $X \sim \operatorname{NegBinomial}(r,p)$, with PMF $$\Pr[X = x] = \binom{x-1}{r-1} p^r (1-p)^{x-r}, \quad x = r, r+1, r+2, \ldots.$$ This is the parametrization you chose. Consider the function $$f_m(z) = \sum_{k=0}^\infty \binom{k+m}{m} z^k.$$ We recall the identity $$\binom{k+m}{m} = \binom{k+m-1}{m-1} + \binom{k+m-1}{m},$$ from which it follows that $$\begin{align*} f_m(z) &= \sum_{k=0}^\infty \binom{k+m-1}{m-1}z^k + \binom{k-1+m}{m} z^k \\ &= f_{m-1}(z) + z \sum_{k=1}^\infty \binom{k-1+m}{m} z^{k-1} \\ &= f_{m-1}(z) + z f_m(z). \end{align*}$$ Consequently, $$f_m(z) = \frac{f_{m-1}(z)}{1-z}.$$ But because $$f_0(z) = \sum_{k=0}^\infty \binom{k}{0} z^k = \frac{1}{1-z},$$ it immediately follows that $$f_m(z) = (1-z)^{-(m+1)}.$$ Now letting $m = r-1$, $z = 1-p$, and $k = x-r$, we obtain $$\sum_{x=r}^\infty \Pr[X = x] = p^r (1 - (1-p))^{-(r-1+1)} = p^r p^{-r} = 1, \quad 0 < p < 1.$$ This proves that $\Pr[X = x]$ does define a valid PMF.


Next, we use this property to calculate $\operatorname{E}[X]$. By definition, $$\operatorname{E}[X] = \sum_{x=r}^\infty x \Pr[X = x].$$ But since $$x \binom{x-1}{r-1} = \frac{x!}{(r-1)!(x-r)!} = r \frac{x!}{r! (x-r)!} = r \binom{x}{r},$$ we find $$\operatorname{E}[X] = \sum_{x=r}^\infty r \binom{x}{r} p^r (1-p)^{x-r} = \frac{r}{p} \sum_{x=r+1}^\infty \binom{x-1}{(r+1)-1} p^{r+1} (1-p)^{x-(r+1)},$$ where we obtained this last expression by incrementing the lower index of summation by $1$, and decrementing the index in the summand by $1$. But you will notice that we have also rewritten the summand so that it is now apparent that it is the sum of the PMF of a negative binomial distribution with parameters $r+1$ and $p$. Thus this sum equals $1$, and we conclude $\operatorname{E}[X] = r/p$.


It is worth noting that for this purely algebraic approach, we have spent most of our effort to show that this parametrization is a valid PMF. The calculation of the expectation is quite straightforward by contrast. Also, if the variance is desired, it is best to consider $\operatorname{E}[X(X-1)],$ rather than $\operatorname{E}[X^2]$, since the former expression more readily yields to the same type of binomial coefficient manipulation that we used for $\operatorname{E}[X]$. I leave this computation as an exercise for the reader.

A final word: perhaps the most elegant computation is to exploit the fact that the negative binomial distribution is a generalization (i.e., a sum of IID) geometric random variables. But the purpose of this answer is to show how the computation can be done purely as an algebraic manipulation with very few prerequisites.

Related Question