Probability Theory – Proof of Kolmogorov Inequality for Bernoulli Random Variables

inequalityprobability theory

The question is about one inequality which shows in Kolmogorov's paper (inequality (3.1)) but is not proved. The inequality says that, if we assume $Y_1,Y_2,\ldots$ are i.i.d. Bernoulli random variables with expectation $p$, then the following inequality holds
$$\mathbb{P} \left( \sup_{k\ge n} |\overline{Y}_k – p|\ge \varepsilon \right) \le 2 e^{-2 n \varepsilon^2 (1 – \varepsilon)} \qquad \text{for any $\varepsilon>0$,}$$
where $\overline{Y}_k = \sum_{j=1}^k Y_j / k = S_k / k$.
To prove and improve this inequality, several people made contributions. However, I found I got lost when reading several steps in the proofs of these literature.

  1. Paper 1 discusses a general case where Bernoulli random variables are independent but not identical. However, in the 9th line of the proof of the Lemma 1 (key lemma), it seems that the inequality is contrary to the assumption ($\phi_i^\varepsilon (t) \ge 1$). Without Lemma 1, we cannot get the inequality we want.
  2. Paper 2 improves the inequality proposed by Kolmogorov. However, Lemma 1 is also weird. First it cites Lemma 3 in paper 3, but the definition of $\phi_\varepsilon (\lambda)$ (paper 2) and $\varphi_\varepsilon (\lambda)$ (paper 3) are different, with opposite sign before $\lambda$. Despite this fact, both these two lemmas are strange. They say that: let $\varepsilon\ge 0$, if there exists a $\lambda>0$ s.t. $\phi_\varepsilon (\lambda) = \mathbb{E} (e^{\lambda (X_i – \varepsilon)}) \le 1$, then
    $$ \mathbb{P} \left( \sup_{k\ge n} \frac{\sum_{i=1}^k Y_i}{k} \ge \varepsilon \right) \le \left[ \phi_\varepsilon(\lambda) \right]^n. $$
    It is strange that the inequality is also true (trivial) if $\phi_\varepsilon (\lambda) > 1$. We only needs to find that minimum of $\phi_\varepsilon(\lambda)$ when we fix an $\varepsilon$.
  3. Then I checked the proof in paper 3, especially the proof of Lemma 2. I did not find mistakes except one citation in the first line of the proof. It is a Russian book and we need theorem in page 183. In the proof, it defines a random variable $\eta (x) = \inf \{ i: 0\le i\le n, S_i \ge x \}$. However, I did not understand the following inequality.
    $$ \mathbb{E} (e^{t S_n} \mid \eta(x) = k) \ge e^{tx} \mathbb{E}(e^{t S_{n-k}}). $$
    I guess it uses the independence of $Y_i$, but I think here we need conditional independence, and it is not true since $\eta(x)$ is defined by $Y_1,\ldots,Y_n$.

I truly wish you can give me some hints, no matter on whether this inequality is true, or on where I can find the correct proof, or on whether the ''mistakes'' I found is right. Thank you. Any comments help.

Best Answer

For the inequality in your third part, here is my explanation: Conditional on $\eta(x)=k$, we have $S_k\geqslant x$ and $S_{k-1}<x$, and the conditional distribution of $Y_{k+1},\dots,Y_n$ is the same as the unconditional distribution of them. Hence

$$ \begin{aligned} \mathbb{E} (\mathrm{e}^{t S_n} \mid \eta(x) = k)&=\mathbb E(\mathrm e^{t(S_k+S_n-S_k)}\mid\eta(x)=k) \geqslant \mathrm e^{tx} \mathbb{E}(e^{t (S_{n}-S_k)}\mid\eta(x)=k)\\ &=\mathrm e^{tx}\mathbb E(e^{t(S_{n}-S_k)})=\mathrm e^{tx}\mathbb E(e^{tS_{n-k}}) . \end{aligned} $$