Bounding $\max e^{x_i}$ by $\sum e^{x_i}$ vs $e^{\sum (x_i)_+}$ when $x_i$ is gaussian

I am trying to derive an upper bound on the expectation of the maximum of $n$ independent zero mean gaussian random variables:
$$\mathbb E \left [ \max_i^n w_i\right ]$$
My thought is to use the moment generating function to form this bound. Proceding:

$$= \mathbb E \left [ \frac{1}{\lambda} \log \exp \lambda \max w_i \right ]$$

where $\lambda \in \mathbb R$. Now, applying Jensen's inequality:

$$\leq \frac{1}{\lambda} \log \mathbb E \left [ \exp \lambda \max w_i \right ]$$

Since the max is difficult to handle directly, I consider a union bound, but I have two ways I can apply this union bound and it's not clear to me which one is the better choice.

The first option (and what I originally tried):

$$\stackrel{(A)}{\leq} \frac{1}{\lambda} \log \mathbb E \left [ \exp \lambda \sum (w_i)_+ \right ]$$

where $(\,\cdot\,)_+ := \max\{0, \cdot\,\}$. Now I can apply independence and evaluate the expression by computing the moment generating function for $(w_i)_+$:

$$= \frac{n}{\lambda} \log \left ( \frac{1}{2} + e^{\sigma^2\lambda^2/2}\Phi(\sigma \lambda)\right )$$

where $\Phi$ is the normal CDF. The next step would be to select a value for $\lambda$ which makes this expression small. This is challenging for me, so if anyone has any tips that would be very helpful. I don't know how to proceed, but the leading $n$ term suggests that this bound is not good.

The core of this is inequality $(A)$, which I thought should be good when $w_i$ are typically small. But on second look $\sum (w_i)_+ \approx \sum |w_i| / 2 = O_p(\sqrt n)$ by the CLT which is certainly not small when $n$ is large.

The second option is to apply the union bound outside the exponential:

$$\stackrel{(B)}{\leq}\frac{1}{n} \log n \mathbb E [\exp\{\lambda w_i\}]$$
from where we can compute the moment generating function of $w_i$ and optimize $\lambda$:
$$=\sqrt{2\sigma^2 \log n}$$

This is the typical bound that is given. Interestingly, independence is not used and the same bound applies to sub-gaussian random variables in general, so I would expect this bound to be worse than the one obtainable from my first attempt.

My questions

How can I bound $\frac{n}{\lambda} \log \left ( \frac{1}{2} + e^{\sigma^2\lambda^2/2}\Phi(\sigma \lambda)\right )$?
Is there some way to use the knowledge that the $w_i$ are independent and gaussian to obtain a better bound than $\sqrt{2\sigma^2 \log n}$?

Best Answer

We have $$ \frac{n}{\lambda} \log \left ( \frac{1}{2} + e^{\sigma^2\lambda^2/2}\Phi(\sigma \lambda)\right) = n\sigma \frac 1{\sigma \lambda} \log\left(\frac12+e^{(\sigma\lambda)^2/2}\Phi(\sigma\lambda)\right) = n\sigma G(\sigma \lambda), $$ for some fixed function $G$. So, minimizing over $\lambda>0$ will give the result $n\sigma G_{min}$, where $G_{min}=1/\sqrt{2 \pi}$ is the infimum of $G$ over $\mathbf{R}_{>0}$.
Suppose that we instead wanted to compute the median $x$ of $\max_{1\le i\le n} w_i$, where the $w_i$s are i.i.d. normal with mean 0 and variance 1. Then we want to solve $$ \Phi(x)^n = \frac12, $$ so $$\Phi(x)=2^{-1/n}$$ and $$1-\Phi(x)=1-2^{-1/n}=\frac1n\log 2 + O(n^{-2}).$$ The left-hand side of this can be approximated by $$ \frac{1}{x\sqrt{2 \pi} } e^{-x^2/2} (1 + O(x^{-2})), $$ giving $x=\sqrt{2\log n} +o(1)$. So, you should not expect to get a bound much better than $\sigma \sqrt{2 \log n}$. Also, if the $w_i$s are i.i.d. normal with mean 0 and variance 1, for appropriately chosen $a_n$ and $b_n$, as $n\to\infty$, $$ {\cal L}(((\max_{1\le i\le n} w_i )-b_n)/a_n)\rightarrow {\cal L}(G), $$ where $G$ has a standard Gumbel distribution. In fact we can take $a_n=\Theta((\log n)^{-1/2})$ and $b_n=\sqrt{2\log n}+o(1)$.

Best Answer

Related Solutions

Why finite sub-gaussian norm indicates sub-gaussian distribution

Sub-gaussian norm vs. variance proxy

Related Question