When is it true that $\sup g – \inf g \le 2\sup g$

inequalitymachine learningrandom variablesreal-analysisstatistics

I am currently reading the first version of the paper titled "On the Margin Theory of Feedforward Neural Networks" by Colin Wei, Jason D. Lee, Qiang Liu and Tengyu Ma.

In Lemma C.4 of the appendix, the authors give upper bounds for the Rademacher complexity of some specific function class, and there is one step in the proof they give that I fail to understand. Page 21, the authors give the following upper bound :
$$\begin{align}
\mathbb E_{\epsilon_i}\left[\sup_{u\in\mathbb S^d}\left\lvert\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i)\right\lvert – \inf_{u\in\mathbb S^d}\left\lvert\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i)\right\lvert\right] &\le \mathbb E_{\epsilon_i}\left[\sup_{u\in\mathbb S^d}\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i) – \inf_{u\in\mathbb S^d}\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i)\right] \\
&\le 2\mathbb E_{\epsilon_i}\left[\sup_{u\in\mathbb S^d}\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i)\right]\end{align} $$

To provide some context here, $\epsilon_i$ are i.i.d. Rademacher random variables, $\mathbb S^d$ is the $d$-dimensional unit sphere, $x_i$ are elements of $\mathbb R^d$ and $\phi$ is an $M$-Lipschitz activation function.

However these details don't seem very relevant here since in the first inequality, it seems that they simply used the fact that $\sup |g| – \inf |g| \le \sup g – \inf g$.
The second inequality is quite puzzling to me because it seems to use the fact that $\sup g – \inf g \le 2\sup g$ which is equivalent to $ -\inf g \le \sup g$ and is not true in general (consider for instance $g : x \mapsto -x^2$).

I am thinking that maybe under some assumption this inequality might be true, but I can't manage to get further than this.
I would be grateful for any help in proving why the second inequality holds true.

Best Answer

Thanks to @PhoemueX's comment, I give the following proof of why the inequality holds true.
We have that : $$\begin{align}\mathbb E_{\epsilon_i}\left[\sup_{u\in\mathbb S^d}\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i) - \inf_{u\in\mathbb S^d}\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i)\right] &= \mathbb E_{\epsilon_i}\left[\sup_{u\in\mathbb S^d}\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i)\right] - \mathbb E_{\epsilon_i}\left[\inf_{u\in\mathbb S^d}\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i)\right] \tag1\\ &= \mathbb E_{\epsilon_i}\left[\sup_{u\in\mathbb S^d}\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i)\right] + \mathbb E_{\epsilon_i}\left[-\inf_{u\in\mathbb S^d}\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i)\right] \tag2 \\ &=\mathbb E_{\epsilon_i}\left[\sup_{u\in\mathbb S^d}\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i)\right] + \mathbb E_{\epsilon_i}\left[\sup_{u\in\mathbb S^d}-\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i)\right] \tag3 \\ &=\mathbb E_{\epsilon_i}\left[\sup_{u\in\mathbb S^d}\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i)\right] + \mathbb E_{\epsilon_i}\left[\sup_{u\in\mathbb S^d}\sum_{i=1}^{n}(-\epsilon_i) \phi(u^T x_i)\right]\tag4 \\ &=2\mathbb E_{\epsilon_i}\left[\sup_{u\in\mathbb S^d}\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i)\right]\tag5\end{align} $$

Here, lines (1) and (2) are using the linearity of expectation.
To go from (2) to (3) we use the fact that $-\inf(g) = \sup(-g)$ for any real-valued function $g$. $(\star) $
Finally, to get from (4) to (5) we use the fact that $(\epsilon_i)$ and $(-\epsilon_i)$ are identically distributed and thus $$ \mathbb E_{\epsilon_i}\left[\sup_{u\in\mathbb S^d}\sum_{i=1}^{n}(-\epsilon_i) \phi(u^T x_i)\right] = \mathbb E_{(-\epsilon_i)}\left[\sup_{u\in\mathbb S^d}\sum_{i=1}^{n}(-\epsilon_i) \phi(u^T x_i)\right] =\mathbb E_{\epsilon_i}\left[\sup_{u\in\mathbb S^d}\sum_{i=1}^{n}\epsilon_i \phi(u^T x_i)\right]\,\,\blacksquare $$

Now for the sake of completeness, I also give a quick proof of $(\star)$ :

Lemma : For any real-valued function $g$, $-\inf(g) = \sup(-g)$

Proof : By definition, $\inf(g) \le g(x)\,\,\, \forall x$ and thus $-\inf(g) \ge -g(x)\,\,\, \forall x $ as well, so $-\inf(g)$ is an upper bound of $-g$. Furthermore, for any real $y \le -\inf(g)$, we have that $-y \ge \inf(g)$ so there exists $t$ such that $-y\ge g(t) \implies y\le -g(t)$ (so $y$ is not an upper bound of $-g$).
From this we deduce that $-\inf(g)$ is the smallest upper bound of $-g$ i.e. $-\inf(g) = \sup(-g) \,\,\, \blacksquare$