[Math] Probability concerning summation of independent trials

probabilitysummation

I am have a very hard time solving this particular question. If anyone can help my understanding by providing a helpful solution, it would be much appreciated. Thank you.

Consider n independent trials, each of which results in one of the outcomes $1, \dots , k$ with respective probabilities $p_1,\dots,p_k$, $$\sum_{i=1}^{k}p_i = 1$$
Show that if all the $p_i$ are small, then the probability that no trial outcome occurs more than once is approximately equal to

$$\exp\left(\frac{-n(n-1)}{2}\left(\sum_{i}^{}p_i^2\right)\right) $$

Best Answer

Let $|j|$ denote the number of times event $j$ occurs in the $n$ trials.

In the proof below, I essentially use the inclusion exclusion principle.


Suppose event $i$ occurs at least twice. The probability of all such sequences is

$$P_n\{|i|\ge 2\} = \binom{n}{2} p_i^2. $$

This can be thought of as the following: Pick $2$ locations out of $n$ where event $i$ occurs. The rest are chosen arbitrarily, and summing over those choices leaves us with the above.

The probability that at least one event occurs at least twice is then upper bounded by

$$P_n\{\exists i : |i| \ge 2\}\le\binom{n}{2} \sum_i p_i^2.$$


Now, the probability that events $i$ and $j \neq i$ occur at least twice is precisely

$$P_n\{|i|\ge 2, |j| \ge 2\} = \frac{1}{2} \binom{n}{2}\binom{n-2}{2} p_i^2 p_j^2$$

This is the same argument as before. The factor of half is introduced because the events where $i$ is placed first and then $j$ and the events where $j$ is placed first and then $i$ are the same.

Thus, the probability that at least two events occur at least twice is upper bounded by:

$$P_n\{\exists i,j : |i|, |j| \ge 2\}\le \frac{1}{2} \binom{n}{2}\binom{n-2}{2} \sum_{j\neq i}\sum_i p_i^2 p_j^2 \le \frac{1}{2}\left(\binom{n}{2}\sum_i p_i^2 \right)^2$$

Thus, the probability that precisely one event occurs more than once is lower bounded by \begin{align} P_n\{ \exists i : |i| \ge 2\} &\ge \binom{n}{2} \sum_i p_i^2 - \frac{1}{2}\left(\binom{n}{2}\sum_i p_i^2 \right)^2 \end{align}

This is using the inclusion-exclusion principle up to the second term, and the fact that terms are decreasing with order in the expansion therein (this may need more sophisticated justification).


Define $x:= \binom{n}{2} \sum_i p_i^2$. Merging the results above,

$$ 1 - x + \frac{x^2}{2} \ge P_n\{ |i|=1~\forall i\} \ge 1 - x$$

Which uses the fact that $P_n\{ |i| \le 1~\forall i\} = 1 - P_n\{ \exists i : |i| \ge 2\}$

As $x \to 0$, i.e., the regime where $\binom{n}{2}\sum p_i^2 \to 0$, both the left and right hand side tend to $\exp{-x}$. Consequently, we establish the very crude approximation:

$$P_n\{ |i| \le 1~\forall i\} \approx \exp \left( -\binom{n}{2} \left(\sum_{i \in [1:k]} p_i^2\right)\right)$$

Related Question