[Math] Maximum of two normal random variables

pr.probability

The main purpose of the following question is to get some intuition and deeper understanding why the presented method works which would hopefully help me in trying to adapt it to the setting I am dealing with in my research.

Let $X,Y\sim N(0,1)$, not necessarily independent. Suppose we want to find an upper bound for $\mathbb{E}\max(X,Y)$.

The most obvious approach would be something like the following
$$\mathbb{E}\max(X,Y)\leq \mathbb{E}|X|+\mathbb{E}|Y|=2\sqrt{2/\pi}\approx 1.59$$

However, I've found the trick in the literature that uses Laplace transform to get something better. Although the idea is much less obvious, details are still easy. For any $\lambda >0$, Jensen's inequality gives us the following

$$\mathbb{E}\max(X,Y) \leq \frac1{\lambda}\log\left(\mathbb{E}e^{\lambda\max(X,Y)}\right)\leq \frac1{\lambda}\log\left(\mathbb{E}e^{\lambda X}+\mathbb{E}e^{\lambda Y}\right) = \frac{\log(2e^{\lambda^2/2})}{\lambda}.$$
Minimizing this gives us the upper bound $\log(4) \approx 1.17$, which is better than the previous approach.

Now, my question is, heuristically/intuitively, why is second method better? Or to put in a different way, is there some easy way to see that the second method should give a better bound even before doing the actual calculations that confirm this?

At this stage, I don't have any intuition for why this works, and I am certainly not fine with that's the standard trick researchers in the field use.

Best Answer

First, an upper bound that beats your second bound is the following: use the equality $$\max(a,b)=(a+b+|a-b|)/2.$$ Then $$E\max(X,Y)=E|X-Y|/2\leq \frac{1}{2} (E|X|+E|Y|)=E|X|=\sqrt{2/\pi}\sim 0.798$$ This bound cannot be improved as the case $Y=-X$ shows.

So you see that your second bound is better not because you used the exponential moments, but rather because your first bound controls the max function way too brutally - you lost a factor of $2$. The advantage of the second method over the little trick I showed above is that it generalizes better when you deal with the max of more than two variables.

Related Solutions

[Math] what is the cycle length of the maximum normalized cycle in the directed complete graph

I wrote a program to collect some data.

For $n=8$, and $10^5$ trials, here are statistics on the longest cycles of length $k$ and the counts of the times that the cycle with the greatest normalized weight had length $k$.

k  count         avg            std_dev
3  50415  1.40995707256456 0.277702203891974
4  30427  1.3675029633889 0.248163593506348
5  13738  1.32184789116913 0.229675012490759
6   4428  1.26765935699902 0.215218146521779
7    916  1.20083001890189 0.202927859960246
8     76  1.11148487469463 0.190259341168933

In a few cases I inspected, the largest weight cycle of length $k+1$ often shared a directed chain of $k$ vertices with the largest weight cycle of length $k$, but of course this did not always happen. There seemed to be a high correlation between the largest weights of cycles of different lengths.

For $n=10, 12, 20$, I did a restricted optimization over the cycles of length at most $6$.

        n=10, 10^5 trials
k  count         avg            std_dev
3  44788 1.56377702460182 0.258071707092035
4  30386 1.53787677069062 0.228885384830286
5  16974 1.50659766688642 0.212244752764919
6   7852 1.4715247336037 0.199249497688295

        n=12, 10^5 trials
k  count         avg            std_dev
3  41207 1.67848840347225 0.244485830656911
4  29722 1.66261483794121 0.21443274525213
5  18687 1.64098125565814 0.198203806693267
6  10384 1.61519351038532 0.186681888604542

        n=20, 2000 trials
k  count         avg            std_dev
3    667 1.97010656830871 0.212728229010943
4    584 1.97273614009628 0.18001851348712
5    418 1.96707199503644 0.16332139093596
6    331 1.95442360307882 0.154839166051771

[Math] Stopping time of two dimensional random walk

Note that
$$\mathbf E[\tau(t)]=\sum_{j=0}^\infty \mathbf P(\tau(t)>j)\le t+1+\sqrt{t}+\sum_{j=[t+\sqrt{t}]+1}^\infty \mathbf P(\tau(t)>j).$$ To bound the sum we use $\{\tau(t)>j\}\subset \{U_j<t\mu_X\mbox{ or } V_j<t\mu_Y\} $. Therefore, $$ \mathbf P(\tau(t)>j)\le \mathbf P(U_j<t\mu_X)+\mathbf P(V_j<t\mu_Y) $$ Now we can use the fact that $X_i$ and $Y_i$ are bounded and apply Hoeffding's inequality (https://en.wikipedia.org/wiki/Hoeffding%27s_inequality), which gives for $j>t$, $$ \mathbf P(U_j<t\mu_X)=\mathbf P(U_j-j\mu_X<(t-j)\mu_X)\le \exp\left(-\frac{2(t-j)^2\mu_X^2}{4j(\mu_XK_X)^2}\right) =\exp\left(-\frac{(t-j)^2}{2j(K_X)^2}\right), $$ where I used the condition $|X_n|\le K_X\mu_X$. Similarly, $$ \mathbf P(U_j<t\mu_X)\le\exp\left(-\frac{(t-j)^2}{2j(K_Y)^2}\right), $$ Then, $$ \mathbf E[\tau(t)]\le t+1+\sqrt{t}+\sum_{j=[t+\sqrt{t}]+1}^\infty \left(e^{-\frac{(j-t)^2}{2j(K_Y)^2}}+e^{-\frac{(j-t)^2}{2j(K_X)^2}}\right) $$

Now note that for $j\ge t+\sqrt{t}$ we have the following estimate $j\le (j-t)\sqrt{t}$, which implies that $$ \sum_{j=[t+\sqrt{t}]+1}^\infty \left(e^{-\frac{(j-t)^2}{2j(K_Y)^2}}+e^{-\frac{(j-t)^2}{2j(K_X)^2}}\right)\le \sum_{j=[t+\sqrt{t}]+1}^\infty \left(e^{-\frac{(j-t)}{2(K_Y)^2\sqrt{t}}}+e^{-\frac{(j-t)}{2(K_X)^2\sqrt{t}}}\right) $$ Now the latter sum is simply a sum of geometric series. Hence, $$ \mathbf E[\tau(t)]\le t+1+\sqrt{t}+\frac{1}{1-e^{-\frac{1}{2(K_X)^2\sqrt{t}}}} +\frac{1}{1-e^{-\frac{1}{2(K_Y)^2\sqrt{t}}}} $$ and finally, $\mathbf E[\tau(t)]\le t+2(1+K_X^2+K_Y^2)\sqrt{t}$ for large $t$.

The constant in front of $\sqrt{t}$ is not sharp.

In principle, it is possible to obtain the exact behaviour $\mathbf E[\tau(t)]=t+A\sqrt t +o(\sqrt t),\quad t\to\infty$ and identify $A$. I will briefly sketch how it can be done.

As above, it is sufficient to obtain asymptotics for $\mathbf P(\tau(t)>j)$. For $j\ge 2t$ this probability will be exponentially decreasing and contribution to $\mathbf E[\tau(t)]$ is negligible. Then, for $j\le 2t$ we can use strong coupling of $(U_j,V_j)$ with 2d Brownian motion $(U(t),V(t))$. Brownian motion $(U(t),V(t))$ should have the same drift and covariance as corresponding random walk. Then strong coupling ensures that with high probability the distance between random walk and Brownian motion is less than $\log(t)$ for $j\le 2t$. Then, on the latter event, $$ \tau^{BM}(t-\log t)\le \tau(t)\le \tau^{BM}(t+\log t), $$ where $\tau^{BM}(t)$ is the corresponding exit time for the Brownian motion. Therefore, it is sufficient to show for the Brownian motion that $\mathbf E[\tau^{BM}(t)]t+A\sqrt t +o(\sqrt t)$. Now this can be done as follows a) we change the measure to remove the drift of the Brownian motion b) we do a linear transformation to remove correlation between coordinates to obtain standard 2d Brownian motion. Now the question can be treated as a question about the exit time of the sBM from a cone. For that we can use information about $\mathbf P(\tau>t, (U(t),V(t))\in dy))$ available in Brownian motion in cones (doi:10.1007/s004400050111).

Best Answer

Related Solutions

[Math] what is the cycle length of the maximum normalized cycle in the directed complete graph

[Math] Stopping time of two dimensional random walk

Related Question