Normal Distribution Analysis – Joint Distribution of Gaussian Random Variables

bivariatecopulamultivariate analysisnormal distribution

Somebody asked me this question in a job interview and I replied that their joint distribution is always Gaussian. I thought that I can always write a bivariate Gaussian with their means and variance and covariances. I am wondering if there can be a case for which the joint probability of two Gaussians is not Gaussian?

Best Answer

The bivariate normal distribution is the exception, not the rule!

It is important to recognize that "almost all" joint distributions with normal marginals are not the bivariate normal distribution. That is, the common viewpoint that joint distributions with normal marginals that are not the bivariate normal are somehow "pathological", is a bit misguided.

Certainly, the multivariate normal is extremely important due to its stability under linear transformations, and so receives the bulk of attention in applications.

Examples

It is useful to start with some examples. The figure below contains heatmaps of six bivariate distributions, all of which have standard normal marginals. The left and middle ones in the top row are bivariate normals, the remaining ones are not (as should be apparent). They're described further below.

Examples of bivariate distribution with standard normal marginals.

The bare bones of copulas

Properties of dependence are often efficiently analyzed using copulas. A bivariate copula is just a fancy name for a probability distribution on the unit square $[0,1]^2$ with uniform marginals.

Suppose $C(u,v)$ is a bivariate copula. Then, immediately from the above, we know that $C(u,v) \geq 0$, $C(u,1) = u$ and $C(1,v) = v$, for example.

We can construct bivariate random variables on the Euclidean plane with prespecified marginals by a simple transformation of a bivariate copula. Let $F_1$ and $F_2$ be prescribed marginal distributions for a pair of random variables $(X,Y)$. Then, if $C(u,v)$ is a bivariate copula, $$ F(x,y) = C(F_1(x), F_2(y)) $$ is a bivariate distribution function with marginals $F_1$ and $F_2$. To see this last fact, just note that $$ \renewcommand{\Pr}{\mathbb P} \Pr(X \leq x) = \Pr(X \leq x, Y < \infty) = C(F_1(x), F_2(\infty)) = C(F_1(x),1) = F_1(x) \>. $$ The same argument works for $F_2$.

For continuous $F_1$ and $F_2$, Sklar's theorem asserts a converse implying uniqueness. That is, given a bivariate distribution $F(x,y)$ with continuous marginals $F_1$, $F_2$, the corresponding copula is unique (on the appropriate range space).

The bivariate normal is exceptional

Sklar's theorem tells us (essentially) that there is only one copula that produces the bivariate normal distribution. This is, aptly named, the Gaussian copula which has density on $[0,1]^2$ $$ c_\rho(u,v) := \frac{\partial^2}{\partial u \, \partial v} C_\rho(u,v) = \frac{\varphi_{2,\rho}(\Phi^{-1}(u),\Phi^{-1}(v))}{\varphi(\Phi^{-1}(u)) \varphi(\Phi^{-1}(v))} \>, $$ where the numerator is the bivariate normal distribution with correlation $\rho$ evaluated at $\Phi^{-1}(u)$ and $\Phi^{-1}(v)$.

But, there are lots of other copulas and all of them will give a bivariate distribution with normal marginals which is not the bivariate normal by using the transformation described in the previous section.

Some details on the examples

Note that if $C(u,v)$ is am arbitrary copula with density $c(u,v)$, the corresponding bivariate density with standard normal marginals under the transformation $F(x,y) = C(\Phi(x),\Phi(y))$ is $$ f(x,y) = \varphi(x) \varphi(y) c(\Phi(x), \Phi(y)) \> . $$

Note that by applying the Gaussian copula in the above equation, we recover the bivariate normal density. But, for any other choice of $c(u,v)$, we will not.

The examples in the figure were constructed as follows (going across each row, one column at a time):

Bivariate normal with independent components.
Bivariate normal with $\rho = -0.4$.
The example given in this answer of Dilip Sarwate. It can easily be seen to be induced by the copula $C(u,v)$ with density $c(u,v) = 2 (\mathbf 1_{(0 \leq u \leq 1/2, 0 \leq v \leq 1/2)} + \mathbf 1_{(1/2 < u \leq 1, 1/2 < v \leq 1)})$.
Generated from the Frank copula with parameter $\theta = 2$.
Generated from the Clayton copula with parameter $\theta = 1$.
Generated from an asymmetric modification of the Clayton copula with parameter $\theta = 3$.

Related Solutions

Bayesian Classifier – Finding the Bayesian Classifier for a Bivariate Gaussian Distribution

Define a random variable $C\in\{1,2\}$ with prior distribution $\mu_C$ given by $$ \mu_C(A) = P\{C\in A\} = \frac{1}{2} I_A(1) + \frac{1}{2} I_A(2) \, , $$ where $A$ is any subset of $\{1,2\}$.

Use the notation $X=(X_1,X_2)$ and $x=(x_1,x_2)$. Suppose that

$$X\mid C=1\sim N(\mu_1,\Sigma_1)\, ,$$ $$X\mid C=2\sim N(\mu_2,\Sigma_2)\, ,$$

where $\mu_1=(2, 2)^\top$, $\Sigma_1=\textrm{diag}(2,1)$, $\mu_2=(2,4)^\top$ and $\Sigma_2=\textrm{diag}(4,2)$.

Now, study this

http://en.wikipedia.org/wiki/Multivariate_normal_distribution

to understand that

$$ f_{X\mid C}(x\mid 1) = \frac{1}{2\pi\sqrt{2}} \exp\left(-\frac{1}{2}\left(\frac{(x_1-2)^2}{2} + \frac{(x_2-2)^2}{1} \right)\right) \, , $$

$$ f_{X\mid C}(x\mid 2) = \frac{1}{4\pi\sqrt{2}} \exp\left(-\frac{1}{2}\left(\frac{(x_1-2)^2}{4} + \frac{(x_2-4)^2}{2} \right)\right) \, . $$

Using Bayes Theorem, we have $$ P\{C=1\mid X=x\} = \frac{\int_{\{1\}} f_{X\mid C}(x\mid c) \,d\mu_C(c)}{\int_{\{1,2\}} f_{X\mid C}(x\mid c)\, d\mu_C(c)} = \frac{\frac{1}{2} f_{X\mid C}(x\mid 1)}{\frac{1}{2} f_{X\mid C}(x\mid 1) + \frac{1}{2} f_{X\mid C}(x\mid 2)} \, . $$

The idea is to decide for the first classification if $$ P\{C=1\mid X=x\} = \frac{1}{1+\frac{f_{X\mid C}(x\mid 2)}{f_{X\mid C}(x\mid 1)}} > \frac{1}{2} \, , $$ which is equivalent to $$ \frac{f_{X\mid C}(x\mid 2)}{f_{X\mid C}(x\mid 1)} < 1 \, , $$ or $$ \log f_{X\mid C}(x\mid 2) - \log f_{X\mid C}(x\mid 1) < 0 \, , $$ which gives us $$ \log \frac{1}{2} - \frac{(x_1-2)^2}{8} - \frac{(x_2-2)^2}{4} + \frac{(x_1-2)^2}{4} + \frac{(x_2-4)^2}{2} < 0 \, . \qquad (*) $$ Therefore, you decide that the point $x$ belongs to classification $1$ if it is inside the ellipse defined by $$ \frac{(x_1-2)^2}{8(2+\log 2)} + \frac{(x_2-6)^2}{4(2+\log 2)} = 1 \, , $$ otherwise, you decide for classification $2$.

Solved – KL divergence between two bivariate Gaussian distribution

We have for two $d$ dimensional multivariaiate Gaussian distributions $P = \mathcal{N}(\mu, \Sigma)$ and $Q = \mathcal{N}(m, S)$ that

$$\DeclareMathOperator{\tr}{tr} \mathbb{D}_{\textrm{KL}}(P \Vert Q) = \frac{1}{2} \left( \tr(S^{-1}\Sigma) - d + (m - \mu)S^{-1}(m-\mu) + \log\frac{|S|}{|\Sigma|} \right). $$

For the bivariate case i.e. $d=2$, parameterising in terms of the component means, standard deviations and correlation coefficients we define the mean vectors and covariance matrices as

$$ \mu = \begin{pmatrix} \mu_1\\ \mu_2 \end{pmatrix},~ \Sigma = \begin{pmatrix} \sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2 \end{pmatrix} \quad\textrm{and}\quad m = \begin{pmatrix} m_1 \\ m_2 \end{pmatrix},~ S = \begin{pmatrix} s_1^2 & r s_1 s_2 \\ r s_1 s_2 & s_2^2 \end{pmatrix}. $$

Using the definitions of the determinant and inverse of $2\times 2$ matrices we have that

$$ |\Sigma| = \sigma_1^2\sigma_2^2(1-\rho^2),~ |S| = s_1^2 s_2^2 (1 - r^2) ~\textrm{and}~ S^{-1} = \frac{1}{s_1^2 s_2^2 (1 - r^2)} \begin{pmatrix} s_2^2 & -r s_1 s_2 \\ -r s_1 s_2 & s_1^2 \end{pmatrix}. $$

Substituting these terms in to the above and simplifying gives

\begin{align} \mathbb{D}_{\textrm{KL}}(P \Vert Q) = &\, \frac{1}{2(1-r^2)} \left( \frac{(\mu_1-m_1)^2}{s_1^2} - 2r \frac{(\mu_1-m_1)(\mu_2-m_2)}{s_1 s_2} + \frac{(\mu_2-m_2)^2}{s_2^2} \right) +\,\\ &\, \frac{1}{2(1-r^2)} \left( \frac{\sigma_1^2-s_1^2}{s_1^2} - 2r \frac{\rho\sigma_1\sigma_2 - r s_1 s_2}{s_1 s_2} + \frac{\sigma_2^2-s_2^2}{s_2^2} \right) +\, \\ &\, \log\left( \frac{s_1 s_2 \sqrt{1-r^2}}{\sigma_1\sigma_2\sqrt{1-\rho^2}} \right). \end{align}

This can be verified with SymPy as follows

from sympy import *
d = 2
s1, s2, r, m1, m2 = symbols('s_1 s_2 r m_1 m_2')
sigma1, sigma2, rho, mu1, mu2 = symbols(r'\sigma_1 \sigma_2 \rho \mu_1 \mu_2')
m = Matrix([m1, m2])
S = Matrix([[s1**2, r*s1*s2], [r*s1*s2, s2**2]])
mu = Matrix([mu1, mu2])
Sigma = Matrix([[sigma1**2, rho*sigma1*sigma2], [rho*sigma1*sigma2, sigma2**2]])
lhs = (
    trace(S**(-1) * Sigma) - d + 
    ((m - mu).T * S**(-1) * (m - mu))[0] +
    log(det(S) / det(Sigma))
) / 2
rhs = (
    ((mu1-m1)**2/s1**2 - 2*r*(mu1-m1)*(mu2-m2)/(s1*s2) + (mu2-m2)**2/s2**2) /
    (2 * (1 - r**2)) +
    ((sigma1**2-s1**2)/s1**2 - 2*r*(rho*sigma1*sigma2-r*s1*s2)/(s1*s2) + 
     (sigma2**2-s2**2)/s2**2) /
    (2 * (1 - r**2)) +
    log((s1**2 * s2**2 * (1-r**2)) / (sigma1**2 * sigma2**2 * (1-rho**2))) / 2
)
simplify(lhs - rhs) == 0

Best Answer

Related Solutions

Bayesian Classifier – Finding the Bayesian Classifier for a Bivariate Gaussian Distribution

Solved – KL divergence between two bivariate Gaussian distribution

Related Question