Hoeffding for bounded random variables, extension of Rademacher case

asymptoticslarge-deviation-theorystatistics

In Vershynin's High-Dimensional Probability, he first proves the Hoeffding bound on page 17

$$\mathbb{P}\left\{\sum_{i=1}^N a_iX_i \geq t\right\} \leq \exp \left(
-\frac{1}{2} \frac{t^2}{\|a\|^2_2} \right)$$

for $X_i$ being Rademacher random variables and $(a_1, \dotsc, a_N) \in \mathbb{R}^N$.

Then he gives an exercise to extend it to bounded random variables.

Exercise 2.2.7: Prove that for $X_i$ independent and bounded, where $m_i \leq X_i \leq M_i$ almost surely, we have for any $t \geq
0$
,

$$\mathbb{P}\left\{ \sum_{i=1}^N (X_i – \mathbb{E}X_i) \geq t \right\}
\leq \exp\left( -\frac{2t^2}{\sum_{i=1}^N (M_i – m_i)^2} \right)$$

perhaps with some absolute constant other than 2 in the tail.

I'm not seeing what he wants here. Is he perhaps saying to compare it to a sum of linearly translated sum of Rademacher variables? He doesn't explicitly say to use the Rademacher case, but I assume that's what he means. It looks like he wants $a$ to be a vector in the $M_i – m_i$, translating the Rademacher random variables. What I'm missing is how to link the general bounded case to the translated-Rademacher case.

To be clear, I am aware of other proofs of Hoeffding for the bounded case; I am interested in a simple-ish way to leverage the linear-combination-of-Rademacher case to get this result.

Best Answer

Ok, so I found the solution here. This solution seems to be more "complete" then what the author seemed to presume in the exercise, so there might still be a simpler answer. In case the link stops working, I'm posting the solution here.

First, multiply by $\lambda >0$, then exponentiate and apply Markov's inequality: $$ P\left( \sum^n_{i=1}(X_i - EX_i)\geq t \right)= P\left(\exp\left(\lambda \sum^n_{i=1}(X_i - EX_i) \right) \geq t \right)\leq E\left[ \exp\left(\lambda \sum^n_{i=1}(X_i - EX_i) \right) \right]e^{-\lambda t} = \prod_i^n E\left[ \exp\left(\lambda \sum^n_{i=1}(X_i - EX_i) \right) \right]e^{-\lambda t} $$

Next, symmetrize the random variable by introducing $X_i'$, which is independent of $X_i$ and has the same distribution. With this, one gets that $P(X_i - X_i' \geq 0) = 1/2$, so the r.v $S_i = sign(X_i-X_i') \sim Rademacher$. Hence,

$$ E\left[ \exp\left(\lambda \sum^n_{i=1}(X_i - EX_i) \right) \right]= E\left[ \exp\left(\lambda \sum^n_{i=1}(X_i - EX_i') \right) \right]=$$ $$ = E_{X_i}\left[ \exp\left( E_{X_i'}\lambda \sum^n_{i=1}(X_i - X_i') \right) \right]\underset{\text{Jensen}}{\leq} E_{X_i}E_{X_i'}\left[ \exp\left( \lambda \sum^n_{i=1}(X_i - X_i') \right) \right]$$ $$ =E\left[ \exp\left( \lambda \sum^n_{i=1}(X_i - X_i') \right) \right] $$

Following with the calculations, $$ E\left[ \exp\left(\lambda \sum^n_{i=1}(X_i - EX_i) \right) \right] = E_{X_i,X_i'}\left[E_S \exp\left(\lambda \sum^n_{i=1}S(X_i - X_i') \right) \right]\leq E_{X_i,X_i'}\left[ \exp\left(\lambda^2 \sum^n_{i=1}(X_i - X_i')^2/2 \right) \right]\leq \exp(\lambda^2(M_i - m_i)^2/2) $$

Finally, plug this into the first inequality $$ P\left( \sum^n_{i=1}(X_i - EX_i)\geq t \right)\leq \prod_i^n E\left[ \exp\left(\lambda \sum^n_{i=1}(X_i - EX_i) \right) \right]e^{-\lambda t} \leq $$ $$ \prod_i^n \exp(\lambda^2(M_i - m_i)^2/2)e^{-\lambda t} = \exp\left(\lambda^2 \sum^n_{i=1}(M_i-m_i)^2/2 - \lambda t \right) $$ $$ \underset{argmin \lambda}{\leq} \exp\left(\frac{2t^2}{\sum^n_{i=1}(M_i -m_i)^2} \right) $$

Related Question