Probability – Proving the Median of Means: Is the Proof Wrong?

inferencemeanmedianprobability

My main question is about the last step in the proof of one property in this short notes(written by Yen-Chi Chen) about the method of Median of Means(MoM) estimator. The Proposition is stated as follows:

Proposition 1 Assume that $\operatorname{Var}\left(X_{1}\right)<\infty .$ Then the MoM estimator has the following property:
$$
P\left(\left|\widehat{\mu}_{\mathrm{MoM}}-\mu_{0}\right|>\varepsilon\right) \leq e^{-2 K\left(\frac{1}{2}-\frac{K}{n} \frac{\sigma^{2}}{\varepsilon^{2}}\right)^{2}}=e^{-2 \frac{n}{B}\left(\frac{1}{2}-\frac{\sigma^{2}}{B \varepsilon^{2}}\right)^{2}}
$$
for every $n=K \cdot B$. Where K is the number of means(number of subsamples), and B is the size of each subsample.

The proof of it proceed as follows:

Since the event $\left\{\left|\widehat{\mu}_{\mathrm{MoM}}-\mu_{0}\right|>\varepsilon\right\}$ implies that at least $K/2$ of $\hat{\mu_l}$ has to be outside $\varepsilon$ distance to $\mu_0$. Namely,
$$
\left\{\left|\widehat{\mu}_{\mathrm{MoM}}-\mu_{0}\right|>\varepsilon\right\} \subset\left\{\sum_{\ell=1}^{K} I\left(\left|\widehat{\mu}_{\ell}-\mu_{0}\right|>\varepsilon\right) \geq \frac{K}{2}\right\}
$$
Define $Z_{\ell}=I\left(\left|\widehat{\mu}_{\ell}-\mu_{0}\right|>\varepsilon\right)$ and let $p_{\varepsilon, B}=\mathbb{E}\left(Z_{\ell}\right)=P\left(\left|\widehat{\mu}_{\ell}-\mu_{0}\right|>\varepsilon\right)$, then the above implies that
$$
\begin{aligned}
P\left(\left|\widehat{\mu}_{\mathrm{MoM}}-\mu_{0}\right|>\varepsilon\right) & \leq P\left(\sum_{\ell=1}^{K} Z_{\ell} \geq \frac{K}{2}\right) \\
&=P\left(\sum_{\ell=1}^{K}\left(Z_{\ell}-\mathbb{E}\left(Z_{\ell}\right)\right) \geq \frac{K}{2}-K p_{\varepsilon, B}\right) \\
&=P\left(\frac{1}{K} \sum_{\ell=1}^{K}\left(Z_{\ell}-\mathbb{E}\left(Z_{\ell}\right)\right) \geq \frac{1}{2}-p_{\varepsilon, B}\right)
\end{aligned}
$$
The key trick of the MoM estimator is that the random variable $Z_{\ell}$ is IID and is bounded. So by Hoeffding's inequality (one-sided),
$$
P\left(\frac{1}{K} \sum_{\ell=1}^{K}\left(Z_{\ell}-\mathbb{E}\left(Z_{\ell}\right)\right) \geq t\right) \leq e^{-2 K t^{2}}
$$
As a result,
$$
\begin{aligned}
P\left(\left|\widehat{\mu}_{\mathrm{MoM}}-\mu_{0}\right|>\varepsilon\right) & \leq P\left(\frac{1}{K} \sum_{\ell=1}^{K}\left(Z_{\ell}-\mathbb{E}\left(Z_{\ell}\right)\right) \geq \frac{1}{2}-p_{\varepsilon, B}\right) \\
& \leq e^{-2 K\left(\frac{1}{2}-p_{\varepsilon, B}\right)^{2}}
\end{aligned}
$$
To conclude that proof, note that the variance $\sigma^{2}=\operatorname{Var}\left(X_{1}\right)<\infty$ and the Chebeshev's inequality implies
$$
p_{\varepsilon, B}=P\left(\left|\widehat{\mu}_{\ell}-\mu_{0}\right|>\varepsilon\right) \leq \frac{\sigma^{2}}{B \varepsilon^{2}}=\frac{K}{n} \frac{\sigma^{2}}{\varepsilon^{2}}\tag{1}
$$
So the bound becomes
$$
P\left(\left|\widehat{\mu}_{\mathrm{MoM}}-\mu_{0}\right|>\varepsilon\right) \leq e^{-2 K\left(\frac{1}{2}-p_{\varepsilon, B}\right)^{2}} \leq e^{-2 K\left(\frac{1}{2}-\frac{K}{n} \frac{\sigma^{2}}{\varepsilon^{2}}\right)^{2}} \tag{2}
$$

My question is at Eq.(2), while the formula before it looks fine to me. Eq.(1) states $p\le \sigma^2/(B\epsilon^2)$, But there's a $1/2$ factor in the Eq.(2), and if $p_{\epsilon,B}$ is greater than 1/2 at the beginning, then make $p_{\epsilon,B}$ bigger will only cause the term in Eq.(2) to become smaller, which conflicts with the $\le$ notation in the process of replacing of $p_{\epsilon,B}$ into $p\le \sigma^2/(B\epsilon^2)$. So does this proof in the last step, i.e., Eq.(2) wrong? Or it's just that I miss something? Because the references in this short note(chapter 2 of the reference, instead of chapter 3 noted in the note) also did the same proof, so I kind of feel maybe I miss something?

Best Answer

When you use $p_{\epsilon,B}>0.5$ then you are effectively applying Hoeffding's inequality with negative $t$

With Hoeffding's inequality

$$P\left(\frac{1}{K} \sum_{\ell=1}^{K}\left(Z_{\ell}-\mathbb{E}\left(Z_{\ell}\right)\right) \geq t\right) \leq e^{-2 K t^{2}}$$

you do not use $t<0$. For $t = 0$ the upper limiting probability is already $1$.

Best Answer

Related Solutions

Solved – Understanding proof of McDiarmid’s inequality

Solved – Proof of Convergence in Probability

Related Question