Probability of Maximum from Multiple Normal Distribution Selections

correlationnormal distributionprobability

I'm looking for some math that will help me solve the following problems. I'm really interested in solving the last problem, but I have sequenced these problems (from easy to hard) to help you understand what I'm looking for, and to help get me on my way.

Problem #1 – Probability of Maximum from Two Normal Distribution Selections:

I have two uncorrelated normal distributions, A and B. They have mean $\mu_a$ and $\mu_b$, and variance $\sigma_a$ and $\sigma_b$ respectively. I randomly select a value from each distribution A and B ($value_a$ and $value_b$). What is the probability that $value_a$ is greater than $value_b$? For example, distribution A may have a mean of $\mu_a$ = 100 and variance of $\sigma_a$ = 20, while distribution B may have a mean of $\mu_b$ = 80 and variance of $\sigma_b$ = 40. I may select $value_a$ = 95 and $value_b$ = 105. Generalizing, what is the probability that my $value_a$ from distribution A will be greater than my $value_b$ from distribution B?

Problem #2 – Probability from Many Normal Distributions:

Same as above, except now I have many normal distributions: A, B, C, …, N. Each has its own mean and variance. What is the probability that the maximum was drawn from distribution A? From distribution B? From distribution N?

Problem #3 – Probability from Correlated Normal Distributions:

Same as Problem #2, except now the normal distributions are correlated. Hence if distributions A and B are positively correlated, and I draw a high value from A, then it is more likely that I will also draw a high value from B. The correlation between distributions A and B is $\rho_{ab}$, etc. All of the correlations can be collected in a correlation matrix. The correlation matrix is, of course, triangular as $\rho_{ab}$ equals $\rho_{ba}$.

Alternative:

I have assumed that the math is easiest with normal distributions. But perhaps there are other classes of distributions that makes this problem easier, starting with a triangular distribution?

Bonus:

I would like to offer a mathematical foundation for what I am working on. But these problems ultimately need to be solved in software (quickly and often). Right now I am generating arrays of randomly selected numbers (with samples pulled from distributions A, B, …, N) and repeating the experiment 10,000 times. But if there is a software library or GPU silicon that naturally does this kind of number crunching then I'd also be very interested in learning more!

Any pointers to get me going in the right direction?

Best Answer

First, I am assuming below that the vector of variables (e.g., the vector $(A,B)$) follows a multivariate normal distribution. This is more than assuming that they are marginally normal, and needed to solve your problems.

Problem 1: Remark that $B-A\sim\mathcal{N}(\mu_b-\mu_a,\sigma^2_a+\sigma^2_b)$. Then, $$\Pr(A>B)=\Pr(B-A<0)=\Phi\left(\frac{\mu_b-\mu_a}{\sigma^2_a+\sigma^2_b}\right),$$ where $\Phi$ is the cumulative distribution function of a standard normal distribution $\mathcal{N}(0,1)$ (see here for details).

Problem 2: For simplicity I rather denote the variables by $(X_1,…,X_n)$, with $X_i\sim \mathcal{N}(\mu_i,\sigma^2_i)$. I also let $\varphi$ denote the density of a standard normal. Then, \begin{align*} \Pr(X_1=\max(X_1,…,X_n)) &= \Pr(X_1>X_2,…,X_1>X_n) \\ & = E\left[\Pr(X_1>X_2,…,X_1>X_n|X_1)\right] \\ & = \int \Pr(x_1>X_2,…,x_1>X_n) \frac{1}{\sigma_1}\varphi\left(\frac{x-\mu_1}{\sigma_1}\right)dx_1\\ & = \int \prod_{i=2}^n \Phi\left(\frac{x_1-\mu_i}{\sigma_i}\right) \frac{1}{\sigma_1}\varphi\left(\frac{x-\mu_1}{\sigma_1}\right)dx_1. \end{align*} (The third equality follows by independence of the different $X_i) I don't think there is a closed-form expression for this integral, but it can be accurately and quickly approximated by a Gauss-Hermite quadrature, see here.

Problem 3: In this case, let $Y_i=X_1 - X_{i+1}$ for $i=1,…,n-1$. Then $(Y_1,…,Y_{n-1})$ follows a multivariate normal distribution and we seek to compute $\Pr(Y_1>0,…,Y_{n-1}>0)$. This is a particular instance of computing that a normal random vector $Y$ belongs to a hyperrectangle $A$ (say). There are several methods for approximating this probability, but the most popular one is probably the GHK algorithm. Intuitively and compared to "naive" Monte Carlo simulations, the idea is to draw variables so that they are more informative about the event $\{Y\in A\}$.

Related Question