[Math] On expectation of maximum of gaussians

normal distributionorder-statisticsprobability

Let $X_1,\ldots,X_n$ be i.i.d $\mathcal{N}(0,1)$ random variables. I am trying to prove that
\begin{align}
(a)\ \ \mathbb{E} \left[ \max_{i}X_i\right] & \asymp\mathbb{E} \left[ \max_{i}|X_i|\right] \asymp \sqrt{\log n},\\
(b) \ \ \mathbb{E} \left[ \max_{i}X_i\right] &= \sqrt{2 \log n}+o(1)
\end{align}
where $A \asymp B$ means there exists universal constants $m,M >0$ such that $mA \leq B \leq MA$.

For part (a), I was able to prove the upper bound that $\mathbb{E} \left[ \max_{i}X_i\right] \leq \sqrt{2 \log n}$ using Jensen's inequality. How do I prove the lower bound and the fact that the two expectations are equivalent? I've been given the following hint: $\mathbb{P}(\max_{i}X_i \geq t)=1-\mathbb{P}(X_1 \leq t)^n$.

Best Answer

A mostly-worked-out answer to the lower bound in part a:

$$E[\max_i X_i]=E[\max_i X_i 1_{\max_i X_i \geq 0}]+E[\max_i X_i 1_{\max_i X_i<0}].$$

We want to throw out that negative piece. Intuitively, it is unlikely to happen at all and it has bounded expectation. More rigorously, it goes to zero in probability (the probability of it being nonzero is $2^{-n}$) and is pointwise decreasing in magnitude, so by dominated convergence

$$E[\max_i X_i] \geq E[\max_i X_i 1_{\max_i X_i \geq 0}] + o(1) \\ =\int_0^\infty 1-\Phi(t)^n dt + o(1).$$

using the hint and a standard fact from Lebesgue integration of nonnegative functions. Denote the first term by $I$.

Next

$$I \geq \int_0^{\sqrt{2 \log(n)}} 1-\Phi(t)^n dt$$

by simply throwing out regions of positive area.

On $[0,1]$ we have the simple bound $1-\Phi(t)^n \geq 1-\Phi(1)^n$. On $[1,\sqrt{2 \log(n)}]$ we have the bound $\Phi(t) \leq 1-\frac{1}{\sqrt{2 \pi}} e^{-t^2/2}$. (Cf. https://mikespivey.wordpress.com/2011/10/21/normaltails/) Hence

$$I \geq 1-\Phi(1)^n + \int_1^{\sqrt{2 \log(n)}} 1-\left ( 1-\frac{1}{\sqrt{2 \pi}} e^{-t^2/2} \right )^n dt.$$

As for the remaining piece, we're integrating a decreasing function, so we get a lower bound by substituting in the upper limit:

$$I \geq 1-\Phi(1)^n+\int_1^{\sqrt{2 \log(n)}} 1-\left ( 1-\frac{1}{\sqrt{2 \pi}} n^{-1} \right )^n dt.$$

The sequence of numbers in the integrand converges to $1-e^{-\frac{1}{\sqrt{2 \pi}}}>0$, so it is bounded below by $1-e^{-\frac{1}{\sqrt{2 \pi}}}-\varepsilon=:C$ for large enough $n$ depending on $\varepsilon$. Then we get the bound

$$I \geq C(\sqrt{2 \log(n)}-1)+1-\Phi(1)^n.$$

Returning to the original problem we have

$$E[\max_i X_i] \geq C(\sqrt{2 \log(n)}-1)+1-\Phi(1)^n+o(1)$$

which gives the lower bound for part a for sufficiently large $n$. A finite collection of $n$ can always be handled (why?) so we are done.

To solve part b we would need to be able to repeat the derivation to get $C=1$, and I'm not really sure how to do that. One idea would be to change variables to $u=\Phi(t)$, which would give

$$\int_0^\infty 1-\Phi(t)^n dt = \int_{1/2}^1 (1-u^n)\frac{dt}{du} du.$$

where $\frac{dt}{du}$ is the reciprocal of the normal density, written as a function of the normal CDF itself. Perhaps it is possible to get an appropriate series expansion for this quantity to get the result.