The proof of Waring’s theorem in Probability and Random process by Grimmett and Stirzaker

probabilityprobability theoryproof-explanationself-learning

Let $A_1, A_2, … , A_n$ be events and let $N_k$ be the event that exactly $k$ of the $A_i$ occur. Prove the result sometimes referred to as Waring's theorem:

$$P(N_k) = \sum_{i=0}^{n-k} (-1)^i {k+i \choose k}S_{k+i}, \text{where $S_j = \sum_{i_1 < i_2 < … <i_j} P(A_{i_1} \cap A_{i_2}\cap … \cap A_{i_j}).$}$$

Solution: Clearly,
$$P(N_k) = \sum_{S \subseteq \{1,2, … , n\};|S| = k} P\left(\bigcap_{i \in S} A_i \bigcap_{j \not\in S} A_j^c \right).$$
For any such given $S$, we write $A_S = \bigcap_{i\in S}A_i$. Then,
\begin{equation}
P\left(\bigcap_{i \in S} A_i \bigcap_{j \not\in S} A_j^c \right) = P(A_S) – \sum_{j \not\in S} P(A_{S \cup \{j\}})+ \sum_{j<k;j,k \not\in S} P(A_{S \cup \{j,k\}}) – …
\end{equation}

Hence
$$P(N_k) = \sum_{|S| = k} P(A_S) -\sum_{|S| = k+1}{k+1 \choose k}P(A_S)+ … + (-1)^{n-k}{n \choose k} P(A_1 \cap … \cap A_n)$$
where a typical summation is over all subsets $S$ of $\{1, 2, …, n\}$ having the required cardinality.

I am not sure how the author derives the last and the second last display in the solution. I guess he uses the previous results that

$$P\left(\bigcap_{1}^n A_i \right) = \sum_i P(A_i) – \sum_{i < j} P(A_i \cup A_j) + \sum_{i < j <k} P(A_i \cup A_j \cup A_k) – … – (-1)^n P(A_1\cup A_2 \cup … \cup A_n).$$
or similarly,
$$P\left(\bigcup_{1}^n A_i \right) = \sum_i P(A_i) – \sum_{i < j} P(A_i \cap A_j) + \sum_{i < j <k} P(A_i \cap A_j \cap A_k) – … – (-1)^n P(A_1\cap A_2 \cap … \cap A_n)$$
in some way, but I can't figure it out by myself. I appreciate if you give some help.

Best Answer

For the second last display, write $$\begin{align} P\left(A_S\bigcap A_j^c\right) = P\left(A_S\cap\left(\bigcup A_j\right)^c\right) &\stackrel{(*)}=P(A_S)-P\left(A_S\cap \left(\bigcup A_j\right)\right)\\ &=P(A_S)-P\left(\bigcup (A_S\cap A_j)\right) \end{align} $$ and then use inclusion-exclusion on the rightmost union (which is a union over $j\not\in S$). The equality (*) is the identity $P(A\cap B^c)=P(A)-P(A\cap B)$.

For the last display, we sum every term of the previous display over all subsets $S$ with $|S|=k$. The first term that we get (the one with $|S|=k$) is obvious. For the second term, the assertion is that $$ \sum_{|S|=k}\sum_{j \not\in S} P(A_{S \cup \{j\}})=\sum_{|S|=k+1}{k+1\choose k}P(A_S). $$ The reason this is true: the LHS is selecting a subset $S$ of size $k$ and then, given $S$, selecting an element $j$ not in $S$. The RHS is selecting a subset of size $k+1$ and then choosing an element from that subset to be $j$. There are clearly $k+1\choose k$ ways to pick $j$, so the RHS achieves the same sum as the LHS. The remaining terms are argued similarly.

Related Question