Hoeffdings Lemma and Hoeffdings Inequality assumptions

probabilityprobability theory

Hoeffding's lemma states that, given a bounded ($[a,b]$) rvs with zero mean, then

$E[e^{\lambda X}] \leq e^{\frac{(\lambda (b-a))^2}{8}}$

Proof for Hoeffding's inequality for sum of rvs uses the above lemma (in Wikipedia link for Hoeffdings inequality and in text books like Concentration Inequalities). Hoeffdings inequality for rvs $(Y_i)$does not assume that the rvs are zero mean. But in the proof, it uses the fact that $Y_i-E(Y_i) = 0$ (for example here) to use the Hoeffding's Lemma.

I found it bit weird, since we know $Y_i-E(Y_i) = 0$, why can't we simply take that $S_n – E(S_n) = 0$ where $S_n = \sum_{i=1}^n Y_i$ ?
Am I missing something wrong ?

Best Answer

Yes, you can apply Hoeffding's lemma directly to $X:=S_n-E(S_n)$. Since $S_n$ has mean zero and $\sum a_i\le S_n\le\sum b_i$, Hoeffding's lemma gives: $$ E(\exp\big(s[S_n-E(S_n)]\big)\le \exp\big(\frac18 s^2\left[\sum \left(b_i-a_i\right)\right]^2\big).\tag1 $$ But note the subtle difference between (1) and what you get when you apply Hoeffding to each $X_i$: $$ E(\exp\big(s[S_n-E(S_n)]\big)=\prod_{i=1}^n E(\exp\big(s[X_i-E(X_i)]\big)\le \exp\big(\frac18 s^2\sum \left(b_i-a_i\right)^2\big).\tag2 $$ That is, (1) has the quantity $(\sum c_i)^2$ where (2) has $\sum c_i^2$. Since the $c$'s are all nonnegative, we have $\sum c_i^2\le (\sum c_i)^2$ so version (2) gives the sharper upper bound.

This makes a difference, as you can see in the special case where the $a$'s are all equal and the $b$'s are all equal: Compared to $nc^2$ in version (2), version (1) has $n^2c^2$. If you follow this through to the end of the derivation of Hoeffding's inequality, $n^2c^2$ is coarse enough to wipe out the power of the resulting inequality.