The question is about one inequality which shows in Kolmogorov's paper (inequality (3.1)) but is not proved. The inequality says that, if we assume $Y_1,Y_2,\ldots$ are i.i.d. Bernoulli random variables with expectation $p$, then the following inequality holds
$$\mathbb{P} \left( \sup_{k\ge n} |\overline{Y}_k – p|\ge \varepsilon \right) \le 2 e^{-2 n \varepsilon^2 (1 – \varepsilon)} \qquad \text{for any $\varepsilon>0$,}$$
where $\overline{Y}_k = \sum_{j=1}^k Y_j / k = S_k / k$.
To prove and improve this inequality, several people made contributions. However, I found I got lost when reading several steps in the proofs of these literature.
- Paper 1 discusses a general case where Bernoulli random variables are independent but not identical. However, in the 9th line of the proof of the Lemma 1 (key lemma), it seems that the inequality is contrary to the assumption ($\phi_i^\varepsilon (t) \ge 1$). Without Lemma 1, we cannot get the inequality we want.
- Paper 2 improves the inequality proposed by Kolmogorov. However, Lemma 1 is also weird. First it cites Lemma 3 in paper 3, but the definition of $\phi_\varepsilon (\lambda)$ (paper 2) and $\varphi_\varepsilon (\lambda)$ (paper 3) are different, with opposite sign before $\lambda$. Despite this fact, both these two lemmas are strange. They say that: let $\varepsilon\ge 0$, if there exists a $\lambda>0$ s.t. $\phi_\varepsilon (\lambda) = \mathbb{E} (e^{\lambda (X_i – \varepsilon)}) \le 1$, then
$$ \mathbb{P} \left( \sup_{k\ge n} \frac{\sum_{i=1}^k Y_i}{k} \ge \varepsilon \right) \le \left[ \phi_\varepsilon(\lambda) \right]^n. $$
It is strange that the inequality is also true (trivial) if $\phi_\varepsilon (\lambda) > 1$. We only needs to find that minimum of $\phi_\varepsilon(\lambda)$ when we fix an $\varepsilon$. - Then I checked the proof in paper 3, especially the proof of Lemma 2. I did not find mistakes except one citation in the first line of the proof. It is a Russian book and we need theorem in page 183. In the proof, it defines a random variable $\eta (x) = \inf \{ i: 0\le i\le n, S_i \ge x \}$. However, I did not understand the following inequality.
$$ \mathbb{E} (e^{t S_n} \mid \eta(x) = k) \ge e^{tx} \mathbb{E}(e^{t S_{n-k}}). $$
I guess it uses the independence of $Y_i$, but I think here we need conditional independence, and it is not true since $\eta(x)$ is defined by $Y_1,\ldots,Y_n$.
I truly wish you can give me some hints, no matter on whether this inequality is true, or on where I can find the correct proof, or on whether the ''mistakes'' I found is right. Thank you. Any comments help.
Best Answer
For the inequality in your third part, here is my explanation: Conditional on $\eta(x)=k$, we have $S_k\geqslant x$ and $S_{k-1}<x$, and the conditional distribution of $Y_{k+1},\dots,Y_n$ is the same as the unconditional distribution of them. Hence
$$ \begin{aligned} \mathbb{E} (\mathrm{e}^{t S_n} \mid \eta(x) = k)&=\mathbb E(\mathrm e^{t(S_k+S_n-S_k)}\mid\eta(x)=k) \geqslant \mathrm e^{tx} \mathbb{E}(e^{t (S_{n}-S_k)}\mid\eta(x)=k)\\ &=\mathrm e^{tx}\mathbb E(e^{t(S_{n}-S_k)})=\mathrm e^{tx}\mathbb E(e^{tS_{n-k}}) . \end{aligned} $$