[Math] Showing that a random sum of logarithmic mass functions has negative binomial distribution

generating-functionsprobabilityprobability distributionsprobability theory

Specific questions are bolded below. I've been unsuccessful in solving the following problem., which is exercise 5.2.3 from Probability and Random Processes by Grimmett and Stirzaker.

Let $X_1, X_2,\ldots$ be independent and identically
distributed random variables with the logarithmic mass
function $$
f(k) = \frac{(1-p)^k}{k \log(1/p)}, \quad k \geq 1, $$
where $0 < p < 1$. If $N$ is independent of the $X_i$ and has the
Poisson distribution with parameter $\mu$, show that
$Y=\sum_{i=1}^N X_i$ has a negative binomial distribution.

My strategy is to compute the probability generating function (pgf) for $Y$ and and the pgf for a negative binomial distribution (nbd) and show that they're the same, which would imply that $Y$ is indeed distributed negative binomially. I'm using the fact that random variables $f$ and $g$ have the same distribution iff they have the same pgf. Is it true that random variables $f$ and $g$ have the same distribution iff they have the same pgf? If so, please continue reading my attempted solution.

Since $N$ is independent of each $X_i$, Theorem 5.1.25 of the textbook says that $$G_Y= G_{N} \circ G_{X}.$$

It's easy to compute (and it's done in the textbook) $$G_{N}(s) = \exp(\mu(s-1)).$$
Further
$$
\begin{align}
G_{X}(s) &= \sum_{k\geq 1}s^{k} \frac{(1-p)^k}{k \log(1/p)} \\
&= \frac{1}{\log(1/p)} \sum_{k\geq 1} \frac{(s(1-p))^k}{k} \\
&= \frac{\log(1/q)}{\log(1/p)} \sum_{k\geq 1} \frac{(1-q)^k}{k\log(1/q)} \qquad (q := 1-s(1-p))\\
&= \frac{\log(1/q)}{\log(1/p)}\qquad \bigg(\text{Since } \frac{(1-q)^k}{k\log(1/q)} \text{ is a pmf.}\bigg) \\
&= \frac{\log q}{\log p} \\
&= \frac{\log(1-s(1-p))}{\log p}.
\end{align}
$$
Therefore
$$
\begin{align}
G_{Y}(s) &= G_{N}(G_{X}(s)) \\
&= \exp\bigg(\mu \big(\frac{\log(1-s(1-p))}{\log p} – 1\big)\bigg).
\end{align}
$$

Now since an nbd with parameters $r$ and $\alpha$ is a sum of $r$ independent geometric random variables (each with parameter $\alpha$), that nbd has pgf given by raising the pgf of each geomtric random variable to the power $r$:
$$
\left(\frac{\alpha s}{1-s(1-\alpha)}\right)^{r}.
$$
However, setting
$$
\exp\bigg(\mu \big(\frac{\log(1-s(1-p))}{\log p} – 1\big)\bigg) = \left(\frac{\alpha s}{1-s(1-\alpha)}\right)^{r}
$$
allows me to solve for $\alpha$, but gives something weird, which I doubt is correct. In fact, I suspect that it should be the case that $Y$ is nbd with paramters $\alpha = p$ and $r= \mu$.

Where am I erring?

Is there a better (i.e. simpler) approach to this problem?

As a side question, I noticed that Wikipedia gives the pgf of a negative binomial as
$$
\left(\frac{1-\alpha}{1-\alpha s}\right)^{r},
$$
which is apparently different from the one I calculated above (though still not clearly equal to $G_{Y}(s)$ for any values of $\alpha$ and $r$). Why the discrepancy between my calculation of the pgf of an nbd and Wikpedia's entry?

EDIT
In the textbook I cited above, a random variable $W_{r}$ is defined to be nbd if it has pmf
$$
\mathbb{P}(W_{r} = k) = \binom{k-1}{r-1} \alpha^{r} (1-\alpha)^{k-r}, \qquad k=r,r+1,\ldots.
$$
It is then pointed out that $W_{r}$ is the sum of $r$ independent geometric random variables; i.e.
$$
W_{r} = Z_{1} + Z_{2} + \cdots + Z_{r},
$$
where each $Z_{i}$ has pmf
$$
f_{Z} = \alpha(1-\alpha)^{k-1}, \qquad k=1,2,\ldots.
$$

Best Answer

The discrepancy results from the fact that you used the geometric distribution supported on the set $\{1,2,3,\ldots\}$ (number of trials needed to get one success, which is $1$ if there's a success on the first trial), whereas the article used the geometric distribution supported on the set $\{0,1,2,3,\ldots\}$ (number of trials before the first success, which is $0$ if there's a success on the first trial). Also, which "negative binomial distribution" are you using? The the distribution of the number of trials needed to get $r$ successes (counting both the successes and the failures among the trials) or the distribution of the number of failures before the $r$th success (or of successes before the $r$th failure, which is how the article phrases it)? The latter distribution is supported on $\{0,1,2,3,\ldots\}$ and the former on $\{r,r+1,r+2,\ldots\}$. The latter is more interesting in some ways because it allows $r$ to be a non-integer and gives you an actually infinitely divisible family of probability distributions.