Show $\mathbb{P}(S_n \geq a+b) \leq \mathbb{P}(S_n \geq a) \mathbb{P}(S_n \geq b)$ for sum of independent Bernoulli random variables

binomial distributioninequalityprobabilityprobability distributionsprobability theory

Let $X_1,…,X_n$ be random variables independent and identically distributed, where $X_i \sim \textrm{Bernoulli}(p)$. Define $$\displaystyle S_n = \sum_{i=1}^n X_i$$

Show that $P(S_{n} \geq a+b) \leq P(S_n \geq a)P(S_n \geq b)$, where $a + b \leq n.$

Since $X_1,\cdots,X_n$ be random variables i.i.d. and $X_i \sim \textrm{Bernoulli}(p)$, we have that $S_n \sim \textrm{Binomial}(n,p)$. Thus,

$$ P(S_n \geq a)P(S_n \geq b) = \sum_{i=a}^n \dbinom{n}{i}p^i(1-p)^{n-i}\sum_{j=b}^n \binom{n}{j}p^j(1-p)^{n-j}$$

$$\begin{align*} P(S_n \geq a)P(S_n \geq b) &= \sum_{i=a}^n \sum_{j=b}^n \binom{n}{i}p^i(1-p)^{n-i} \binom{n}{j}p^j(1-p)^{n-j}\\ &= \sum_{i=a}^n \sum_{j=b}^n \binom{n}{i}\binom{n}{j}p^{i+j}(1-p)^{2n-i-j} \end{align*}$$

Any hint to continue?

Best Answer

Let $a,b \in \mathbb{N}_0$ such that $a+b \leq n$. Since the random variables $X_i$ take only the values $0$ and $1$, we have

$$\begin{align*} \{S_n \geq a+b\} &= \bigcup_{k=0}^n \left( \{S_0 < a, \ldots, S_{k-1} < a, S_k = a\} \cap \{S_n \geq a+b\} \right) \\ &= \bigcup_{k=0}^n \left( \{S_0 < a, \ldots, S_{k-1} < a, S_k = a\} \cap \{S_n-S_k \geq b\} \right) \end{align*}$$

Thus,

$$\mathbb{P}(S_n \geq a+b) \leq \sum_{k=0}^n \mathbb{P}(S_0<a,\ldots,S_{k-1}<a, S_k = a, S_n-S_k \geq b).$$

As the random variables $S_n-S_k$ and $S_0,\ldots,S_k$ are independent, this implies

$$\mathbb{P}(S_n \geq a+b) \leq \sum_{k=0}^n \mathbb{P}(S_0 < a,\ldots,S_{k-1}<a, S_k = a)\mathbb{P}(S_n-S_k \geq b).$$

Noting that

$$\mathbb{P}(S_n-S_k \geq b) = \mathbb{P}(S_{n-k} \geq b) \leq \mathbb{P}(S_n \geq b)$$

for any $k \leq n$, we get

$$\mathbb{P}(S_n \geq a+b) \leq \mathbb{P}(S_n \geq b) \underbrace{\sum_{k=0}^n \mathbb{P}(S_0 < a,\ldots,S_{k-1}<a, S_k = a)}_{\mathbb{P}(S_n \geq a)}.$$

Remark: Abstractly speaking, the proof uses the strong Markov property of the random walk $(S_n)_{n \in \mathbb{N}}$, i.e. the fact that for any stopping time $\tau$ the process $T_n := S_{n+\tau}-S_{\tau}$, $n \in \mathbb{N}$, is independent from $(S_k)_{k \leq \tau}$ and equals in distribution $(S_n)_{n \in \mathbb{N}}$.