Solved – Understanding proof of McDiarmid’s inequality

probabilityprobability-inequalities

I am working through Wasserman's lecture notes set 2 and I am unable to fill in the missing steps in the derivation of McDiarmid's inequality (p.5). Just like my previous question in the forum, I am reproducing the proof in the notes below and after the proof I will point the steps I am not able to derive.

McDiarmid's Inequality

Let $X_{1}, \ldots, X_{n}$ be independent random variables. Suppose that

\begin{align*}
\sup_{x_{1}, \ldots, x_{n}, x_{i}^{\prime}} \left| g(x_{1}, \ldots, x_{i-1}, x_{i},
x_{i+1}, \ldots, x_{n}) –
g(x_{1}, \ldots, x_{i-1}, x_{i}^{\prime} , x_{i+1}, \ldots, x_{n}) \right| &\le c_{i}
\end{align*}

for $i = 1, \ldots, n$. Then

\begin{align*}
\mathbb{P} \left( g(X_{1}, \ldots, X_{n}) – \mathbb{E}(g(X_{1}, \ldots, X_{n})) \ge
\epsilon \right) & \le \exp \left( -\frac{2\epsilon^{2}}{\sum_{i=1}^{n} c_{i}^{2}}
\right)
\end{align*}

Proof

Let $V_{i} = \mathbb{E}(g | X_{1}, \ldots, X_{i}) – \mathbb{E}(g | X_{1}, \ldots, X_{i-1})$. Then
\begin{align*}
g(X_{1}, \ldots, X_{n}) – \mathbb{E}(g(X_{1}, \ldots, X_{n})) &= \sum_{i=1}^{n} V_{i}
\end{align*}
and $\mathbb{E}(V_{i} | X_{1}, \ldots, X_{i-1}) = 0$

Using a similar argument as in the proof of Hoeffding's Lemma we have,
\begin{align*}
\mathbb{E}(e^{t V_{i} } | X_{1}, \ldots, X_{i-1}) &\le e^{t^2c_{i}^{2}/8}
\end{align*}

Now, for any $t > 0$,
\begin{align*}
\mathbb{P} \left( g(X_{1}, \ldots, X_{n}) – \mathbb{E}(g(X_{1}, \ldots, X_{n})) \ge \epsilon \right) &= \mathbb{P} \left( \sum_{i=1}^{n} V_{i} \ge \epsilon \right) \\\\
&=\mathbb{P} \left( e^{t \sum_{i=1}^{n} V_{i}} \ge e^{t \epsilon} \right)
\le e^{- t \epsilon} \mathbb{E} \left( e^{t \sum_{i=1}^{n} V_{i}} \right) \\\\
&= e^{- t \epsilon} \mathbb{E} \left ( e^{t \sum_{i=1}^{n-1} V_{i}}
\mathbb{E} \left( e^{tV_{n}} | X_{1}, \ldots, X_{n-1} \right) \right ) \\\\
&\le e^{- t \epsilon} e^{t^2c_{n}^{2}/8}
\mathbb{E} \left ( e^{t \sum_{i=1}^{n-1} V_{i}} \right ) \\\\
& \ldots \\\\
&\le e^{- t \epsilon} e^{t^2 \sum_{i=1}^{n} c_{i}^{2}/8}
\end{align*}

The result follows by taking $t = 4 \epsilon / \sum_{i=1}^{n} c_{i}^{2}$.

Questions

How to prove

  1. $\mathbb{E}(V_{i} | X_{1}, \ldots, X_{i-1}) = 0$
  2. $\mathbb{E}(e^{t V_{i} } | X_{1}, \ldots, X_{i-1}) \le e^{t^2c_{i}^{2}/8}$
  3. $\mathbb{E} \left( e^{t \sum_{i=1}^{n} V_{i}} \right) =
    \mathbb{E} \left( e^{t \sum_{i=1}^{n-1} V_{i}}
    \mathbb{E} \left( e^{tV_{n}} | X_{1}, \ldots, X_{n-1} \right) \right)$

Though this involves technical details, any intuition on the details will be helpful.

Best Answer

  • $\mathbb{E}(V_{i} | X_{1}, \ldots, X_{i-1}) = 0$

Let us introduce some notation, $X_{1:i} = X_{1}, \ldots, X_{i}$.

\begin{align*} \mathbb{E}(V_{i} | X_{1:i-1}) &= \mathbb{E}\left( \left[\mathbb{E}(g | X_{1:i}) - \mathbb{E}(g | X_{1:i-1})\right] | X_{1:i-1}\right) \\\\ &=\mathbb{E}\left( \mathbb{E}(g | X_{1:i}) | X_{1:i-1}\right)- \mathbb{E}\left( \mathbb{E}(g | X_{1:i-1}) | X_{1:i-1} \right) \end{align*}

Next apply the definitions of iterated expectations to $\mathbb{E}\left( \mathbb{E}(g | X_{1:i}) | X_{1:i-1}\right)$

First let us work with inner expectation. \begin{align*} \mathbb{E}(g | X_{1:i}) &= \int_{x_{i+1:n}} g f_{X_{1:n} | X_{1:i}} \mathrm{d}x_{i+1:n} \end{align*}

Since $X_{i}$ are independent, we can reduce the joint pdf to \begin{align*} f_{X_{1:n} | X_{1:i}} &= \frac{\prod_{j=1}^{n} f_{X_{j}}(x_{j})} {\prod_{j=1}^{i} f_{X_{j}}(x_{j})} \\\\ &= \prod_{j=i+1}^{n} f_{X_{j}}(x_{j}) \end{align*}

Substituting the joint pdf in the inner expectation integral we get, \begin{align*} \mathbb{E}(g | X_{1:i}) &= \int_{x_{i+1:n}} g \prod_{j=i+1}^{n} f_{X_{j}}(x_{j})\text{dx}_{i+1:n} \end{align*}

Observe that $\mathbb{E}(g | X_{1:i})$ is some function of $X_{1:i}$ only. Thus while doing the outer expectation, the conditional pdf will be $f_{X_{1:i} | X_{1:i-1}}$

Now the outer expectation: \begin{align*} \mathbb{E}\left( \mathbb{E}(g | X_{1:i}) | X_{1:i-1} \right) &= \int_{x_{i}} \left ( \int_{x_{i+1:n}} g \prod_{j=i+1}^{n} f_{X_{j}}(x_{j}) \mathrm{d}x_{i+1:n} \right) \frac{\prod_{j=1}^{i} f_{X_{j}}(x_{j})}{\prod_{j=1}^{i-1} f_{X_{j}}(x_{j})} \mathrm{d}x_{i}\\\\ & = \int_{x_{i:n}} g \prod_{j=i}^{n} f_{X_{j}}(x_{j}) \mathrm{d}x_{i:n} \\\\ & = \mathbb{E}(g | X_{1:i-1}) \end{align*}

Using the same argument used above one can prove, \begin{align*} \mathbb{E}\left( \mathbb{E}(g | X_{1:i-1}) | X_{1:i-1} \right) & = \mathbb{E}(g | X_{1:i-1}) \end{align*}

which leads to \begin{align*} \mathbb{E}(V_{i} | X_{1}, \ldots, X_{i-1}) &= \mathbb{E}\left( \mathbb{E}(g | X_{1:i}) | X_{1:i-1}\right)- \mathbb{E}\left( \mathbb{E}(g | X_{1:i-1}) | X_{1:i-1} \right) \\\\ &= \mathbb{E}(g | X_{1:i-1}) - \mathbb{E}(g | X_{1:i-1}) \\\\ &= 0 \end{align*}


  • $\mathbb{E}(e^{t V_{i} } | X_{1}, \ldots, X_{i-1}) \le e^{t^2c_{i}^{2}/8}$

The above inequality will follow from Hoeffding's lemma if we can prove $(V_{i} | X_{1}, \ldots, X_{i-1})$ is both upper and lower bounded. We have already proved $\mathbb{E}(V_{i} | X_{1}, \ldots, X_{i-1}) = 0$.

The following proof is taken from John Duchi's wonderful notes on the inequalities. Let

\begin{align*} U_{i} &= \sup_{u} \quad \mathbb{E}(g | X_{1:i−1}, u) − \mathbb{E}(g | X_{1:i−1}) \\\\ L_{i} &= \inf_{l} \quad \mathbb{E}(g | X_{1:i−1}, l) − \mathbb{E}(g | X_{1:i−1}) \end{align*}

Now \begin{align*} U_{i} - L_{i} &\le \sup_{l,u} \, \mathbb{E}(g | X_{1:i−1}, u) - \mathbb{E}(g | X_{1:i−1}, l) \\\\ &\le \sup_{l,u} \left ( \int_{x_{i+1}:n} [ g(X_{1:i-1}, u, x_{i+1:n}) - g(X_{1:i-1}, l, x_{i+1:n}) ] \prod_{j=i+1}^{n} f_{X_{j}}(x_{j}) \mathrm{d}x_{i+1:n} \right ) \\\\ &\le \int_{x_{i+1}:n} \sup_{l,u} \; \left ( g(X_{1:i-1}, u, x_{i+1:n}) - g(X_{1:i-1}, l, x_{i+1:n}) \right ) \prod_{j=i+1}^{n} f_{X_{j}}(x_{j}) \; \mathrm{d}x_{i+1:n} \\\\ &\le \int_{x_{i+1}:n} c_{i} \prod_{j=i+1}^{n} f_{X_{j}}(x_{j}) \; \mathrm{d}x_{i+1:n} \\\\ &= c_{i} \end{align*}

The third line follows from Jensen's inequality since $\sup$ is convex and the fourth line from the assumptions in the McDiarmid Inequality. Hence $L_{i} \le V_{i} \le U_{i}$ and we can apply Hoeffding's lemma to get

\begin{align*} \mathbb{E}(e^{t V_{i} } | X_{1}, \ldots, X_{i-1}) &\le e^{t^2c_{i}^{2}/8} \end{align*}


  • $\mathbb{E} \left( e^{t \sum_{i=1}^{n} V_{i}} \right) = \mathbb{E} \left( e^{t \sum_{i=1}^{n-1} V_{i}} \mathbb{E} \left( e^{tV_{n}} | X_{1}, \ldots, X_{n-1} \right) \right)$

A straightforward application of iterated expectation will lead us to the above result.

\begin{align*} \mathbb{E} \left( e^{t \sum_{i=1}^{n} V_{i}} \right) &= \mathbb{E} \left( \mathbb{E} \left( e^{t \sum_{i=1}^{n} V_{i}} | X_{1}, \ldots, X_{n-1} \right) \right) \\\\ &= \mathbb{E} \left( e^{t \sum_{i=1}^{n-1} V_{i}} \mathbb{E} \left( e^{t V_{n}} | X_{1}, \ldots, X_{n-1} \right) \right) \\\\ \end{align*}

The inner expectation is wrt $X_{n}$ while the outer expectation is wrt $X_{1:n-1}$

On applying the Hoeffding's inequality $n$ times repeatedly, we get: \begin{align*} \mathbb{E} \left( e^{t \sum_{i=1}^{n} V_{i}} \right) &= \mathbb{E} \left( e^{t \sum_{i=1}^{n-1} V_{i}} \mathbb{E} \left( e^{t V_{n}} | X_{1}, \ldots, X_{n-1} \right) \right) \\\\ &\le \mathbb{E} \left( e^{t \sum_{i=1}^{n-1} V_{i}} \exp \left( t^{2} c_{n}^{2}/8 \right) \right) \\\\ &\le \exp\left( \frac{1}{8} \sum_{i=1}^{n} t^{2} c_{i}^{2} \right) \end{align*}

Now we are ready to prove McDiarmid's inequality,

\begin{align*} \mathbb{P} \left( g(X_{1}, \ldots, X_{n}) - \mathbb{E}(g(X_{1}, \ldots, X_{n})) \ge \epsilon \right) &= \mathbb{P} \left( \sum_{i=1}^{n} V_{i} \ge \epsilon \right) \\\\ &= \mathbb{P} \left( e^{t \sum_{i=1}^{n} V_{i}} \ge e^{t \epsilon} \right) \\\\ & \le \exp(-t \epsilon) \mathbb{E} \left( e^{t \sum_{i=1}^{n} V_{i}} \right) \\\\ & \le \exp(-t \epsilon) \exp\left( \frac{1}{8} \sum_{i=1}^{n} t^{2} c_{i}^{2} \right) \\\\ & = \exp \left (-t \epsilon + \frac{1}{8} \sum_{i=1}^{n} t^{2} c_{i}^{2} \right) \end{align*}

The third line follows from Markov's inequality. To get the final result, we need to minimize the expression wrt $t$. This occurs at $t = 4 \epsilon / \sum_{i=1}^{n} c_{i}^{2}$.

Related Question