[Math] Probability generating function of the sum of two random variables

generating-functionsprobability theory

Let two $\mathbf{N}$-valued random variables $X$ and $Y$ be given, and let $\phi_X(s) = \sum_k \mathbf{P}(X = k) s^k$ and $\phi_Y(s) = \sum_k \mathbf{P}(Y = k) s^k$ be their respective probability generating functions.

It is stated in Probability and Statistics by Example, Suhov and Kelbert, p. 59, that the following two conditions are equivalent.

(i) $X$ and $Y$ are independent.

(ii) $\phi_{X + Y}(s) = \phi_X(s) \phi_Y(s)$.

I don't have a problem with the fact that (i) implies (ii). However, I don't understand why the converse is true.

The only justification offered in the book is that it follows from the uniqueness of the coefficients of a power series. But I don't understand why the fact that $X + Y$ has the same distribution as if $X$ and $Y$ were independent ought to imply that they are in fact independent.

Can anybody fill in the details of the argument, or else refer me to an easily accessible source? Thanks.

Best Answer

$\begin{align} \text{Given that}: \\ \phi_{X+Y}(s) & = \sum_{k\in\mathcal D(X+Y)} \mathsf P(X+Y=k)\; s^k \\ & = \sum_{x\in\mathcal D(X)} \;\sum_{y\in \mathcal D(Y)} \mathsf P(X=x, Y=y)\; s^{x+y} \\[2ex] \text{And that} \\ \phi_X(s)\phi_Y(s) & = (\sum_{x\in\mathcal D(X)} \mathsf P(X=x)\; s^x )\cdot( \sum_{y\in\mathcal D(Y)} \mathsf P(Y=y)\; s^y) \\ & = \sum_{x\in\mathcal D(X)}\sum_{y\in\mathcal D(Y)} \mathsf P(X=x)\mathsf P(Y=y)\; s^{x+y} \\[2ex] \text{And that independence means:} \\ X\bot Y & \iff \mathsf P(X=x , Y=y) = \mathsf P(X=x)\mathsf P(Y=y) \\[2ex] \text{Therefore } & \text{ for all X, Y using that probability generating function:} \\ & \text{ Independence is a necessary and sufficient condition to declare that} \\ & \text{ the the pgf of the sum, $X+Y$, is equal to the product of the pgfs of $X$ and $Y$.} \\ X\bot Y & \iff \phi_{X+Y}(s) = \phi_X(s)\phi_Y(s) \end{align}$


Edit Summary

Updated to be clear about the domains of the sum, and why we can switch from summing all k in the domain of X+Y to double summing over all x in the domain of X and all y in the domain of Y. It's basically about expectations.

$$\begin{align} \sum_{z\in \mathcal D(X+Y)} s^z\;\mathsf P(X+Y=z) & = \sum_{z\in \mathcal D(X+Y)} s^z\; \sum_{x\in\mathcal D(X)} \mathsf P(X=x\cap X+Y=z) & \text{Law of Total Probability} \\ & = \sum_{z\in \mathcal D(X+Y)} s^z \sum_{x\in\mathcal D(X)} \mathsf P(X=x\cap Y=z-x) & \text{Equivalence of events} \\ & = \sum_{x\in\mathcal D(X)} \sum_{z\in \mathcal D(X+Y\mid X=x)} s^{z} \mathsf P(X=x\cap Y=z-x) & \text{by reordering the summations} \\ & = \sum_{x\in\mathcal D(X)} \sum_{y\in \mathcal D(Y)} s^{x+y} \mathsf P(X=x \cap Y=y) & \text{by change of index} \\[3ex] \text{Alternatively:} \\ \mathsf E_{X+Y}[s^{X+Y}] & = \mathsf E_X[\mathsf E_{X+Y\mid X}[s^{X+Y}\mid X]] & \text{Conditional Expectation} \\ & = \mathsf E_X[s^X \mathsf E_{Y\mid X}[s^{Y}\mid X]] & \text{Linearity of Expectation} \\[3ex] \therefore \mathsf E_{X+Y}[s^{X+Y}] & = \mathsf E_X[s^X \mathsf E_Y[s^Y]] & \text{by independence} \\ & = \mathsf E_X[s^X]\times \mathsf E_Y[s^Y] & \text{by linearity of expectation} \end{align}$$