[Math] Maximum of two normal random variables

pr.probability

The main purpose of the following question is to get some intuition and deeper understanding why the presented method works which would hopefully help me in trying to adapt it to the setting I am dealing with in my research.

Let $X,Y\sim N(0,1)$, not necessarily independent. Suppose we want to find an upper bound for $\mathbb{E}\max(X,Y)$.

The most obvious approach would be something like the following
$$\mathbb{E}\max(X,Y)\leq \mathbb{E}|X|+\mathbb{E}|Y|=2\sqrt{2/\pi}\approx 1.59$$

However, I've found the trick in the literature that uses Laplace transform to get something better. Although the idea is much less obvious, details are still easy. For any $\lambda >0$, Jensen's inequality gives us the following

$$\mathbb{E}\max(X,Y) \leq \frac1{\lambda}\log\left(\mathbb{E}e^{\lambda\max(X,Y)}\right)\leq \frac1{\lambda}\log\left(\mathbb{E}e^{\lambda X}+\mathbb{E}e^{\lambda Y}\right) = \frac{\log(2e^{\lambda^2/2})}{\lambda}.$$
Minimizing this gives us the upper bound $\log(4) \approx 1.17$, which is better than the previous approach.

Now, my question is, heuristically/intuitively, why is second method better? Or to put in a different way, is there some easy way to see that the second method should give a better bound even before doing the actual calculations that confirm this?

At this stage, I don't have any intuition for why this works, and I am certainly not fine with that's the standard trick researchers in the field use.

Best Answer

First, an upper bound that beats your second bound is the following: use the equality $$\max(a,b)=(a+b+|a-b|)/2.$$ Then $$E\max(X,Y)=E|X-Y|/2\leq \frac{1}{2} (E|X|+E|Y|)=E|X|=\sqrt{2/\pi}\sim 0.798$$ This bound cannot be improved as the case $Y=-X$ shows.

So you see that your second bound is better not because you used the exponential moments, but rather because your first bound controls the max function way too brutally - you lost a factor of $2$. The advantage of the second method over the little trick I showed above is that it generalizes better when you deal with the max of more than two variables.

Related Question