Somebody asked me this question in a job interview and I replied that their joint distribution is always Gaussian. I thought that I can always write a bivariate Gaussian with their means and variance and covariances. I am wondering if there can be a case for which the joint probability of two Gaussians is not Gaussian?
Normal Distribution Analysis – Joint Distribution of Gaussian Random Variables
bivariatecopulamultivariate analysisnormal distribution
Related Solutions
Define a random variable $C\in\{1,2\}$ with prior distribution $\mu_C$ given by $$ \mu_C(A) = P\{C\in A\} = \frac{1}{2} I_A(1) + \frac{1}{2} I_A(2) \, , $$ where $A$ is any subset of $\{1,2\}$.
Use the notation $X=(X_1,X_2)$ and $x=(x_1,x_2)$. Suppose that
$$X\mid C=1\sim N(\mu_1,\Sigma_1)\, ,$$ $$X\mid C=2\sim N(\mu_2,\Sigma_2)\, ,$$
where $\mu_1=(2, 2)^\top$, $\Sigma_1=\textrm{diag}(2,1)$, $\mu_2=(2,4)^\top$ and $\Sigma_2=\textrm{diag}(4,2)$.
Now, study this
http://en.wikipedia.org/wiki/Multivariate_normal_distribution
to understand that
$$ f_{X\mid C}(x\mid 1) = \frac{1}{2\pi\sqrt{2}} \exp\left(-\frac{1}{2}\left(\frac{(x_1-2)^2}{2} + \frac{(x_2-2)^2}{1} \right)\right) \, , $$
$$ f_{X\mid C}(x\mid 2) = \frac{1}{4\pi\sqrt{2}} \exp\left(-\frac{1}{2}\left(\frac{(x_1-2)^2}{4} + \frac{(x_2-4)^2}{2} \right)\right) \, . $$
Using Bayes Theorem, we have $$ P\{C=1\mid X=x\} = \frac{\int_{\{1\}} f_{X\mid C}(x\mid c) \,d\mu_C(c)}{\int_{\{1,2\}} f_{X\mid C}(x\mid c)\, d\mu_C(c)} = \frac{\frac{1}{2} f_{X\mid C}(x\mid 1)}{\frac{1}{2} f_{X\mid C}(x\mid 1) + \frac{1}{2} f_{X\mid C}(x\mid 2)} \, . $$
The idea is to decide for the first classification if $$ P\{C=1\mid X=x\} = \frac{1}{1+\frac{f_{X\mid C}(x\mid 2)}{f_{X\mid C}(x\mid 1)}} > \frac{1}{2} \, , $$ which is equivalent to $$ \frac{f_{X\mid C}(x\mid 2)}{f_{X\mid C}(x\mid 1)} < 1 \, , $$ or $$ \log f_{X\mid C}(x\mid 2) - \log f_{X\mid C}(x\mid 1) < 0 \, , $$ which gives us $$ \log \frac{1}{2} - \frac{(x_1-2)^2}{8} - \frac{(x_2-2)^2}{4} + \frac{(x_1-2)^2}{4} + \frac{(x_2-4)^2}{2} < 0 \, . \qquad (*) $$ Therefore, you decide that the point $x$ belongs to classification $1$ if it is inside the ellipse defined by $$ \frac{(x_1-2)^2}{8(2+\log 2)} + \frac{(x_2-6)^2}{4(2+\log 2)} = 1 \, , $$ otherwise, you decide for classification $2$.
We have for two $d$ dimensional multivariaiate Gaussian distributions $P = \mathcal{N}(\mu, \Sigma)$ and $Q = \mathcal{N}(m, S)$ that
$$\DeclareMathOperator{\tr}{tr} \mathbb{D}_{\textrm{KL}}(P \Vert Q) = \frac{1}{2} \left( \tr(S^{-1}\Sigma) - d + (m - \mu)S^{-1}(m-\mu) + \log\frac{|S|}{|\Sigma|} \right). $$
For the bivariate case i.e. $d=2$, parameterising in terms of the component means, standard deviations and correlation coefficients we define the mean vectors and covariance matrices as
$$ \mu = \begin{pmatrix} \mu_1\\ \mu_2 \end{pmatrix},~ \Sigma = \begin{pmatrix} \sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2 \end{pmatrix} \quad\textrm{and}\quad m = \begin{pmatrix} m_1 \\ m_2 \end{pmatrix},~ S = \begin{pmatrix} s_1^2 & r s_1 s_2 \\ r s_1 s_2 & s_2^2 \end{pmatrix}. $$
Using the definitions of the determinant and inverse of $2\times 2$ matrices we have that
$$ |\Sigma| = \sigma_1^2\sigma_2^2(1-\rho^2),~ |S| = s_1^2 s_2^2 (1 - r^2) ~\textrm{and}~ S^{-1} = \frac{1}{s_1^2 s_2^2 (1 - r^2)} \begin{pmatrix} s_2^2 & -r s_1 s_2 \\ -r s_1 s_2 & s_1^2 \end{pmatrix}. $$
Substituting these terms in to the above and simplifying gives
\begin{align} \mathbb{D}_{\textrm{KL}}(P \Vert Q) = &\, \frac{1}{2(1-r^2)} \left( \frac{(\mu_1-m_1)^2}{s_1^2} - 2r \frac{(\mu_1-m_1)(\mu_2-m_2)}{s_1 s_2} + \frac{(\mu_2-m_2)^2}{s_2^2} \right) +\,\\ &\, \frac{1}{2(1-r^2)} \left( \frac{\sigma_1^2-s_1^2}{s_1^2} - 2r \frac{\rho\sigma_1\sigma_2 - r s_1 s_2}{s_1 s_2} + \frac{\sigma_2^2-s_2^2}{s_2^2} \right) +\, \\ &\, \log\left( \frac{s_1 s_2 \sqrt{1-r^2}}{\sigma_1\sigma_2\sqrt{1-\rho^2}} \right). \end{align}
This can be verified with SymPy as follows
from sympy import *
d = 2
s1, s2, r, m1, m2 = symbols('s_1 s_2 r m_1 m_2')
sigma1, sigma2, rho, mu1, mu2 = symbols(r'\sigma_1 \sigma_2 \rho \mu_1 \mu_2')
m = Matrix([m1, m2])
S = Matrix([[s1**2, r*s1*s2], [r*s1*s2, s2**2]])
mu = Matrix([mu1, mu2])
Sigma = Matrix([[sigma1**2, rho*sigma1*sigma2], [rho*sigma1*sigma2, sigma2**2]])
lhs = (
trace(S**(-1) * Sigma) - d +
((m - mu).T * S**(-1) * (m - mu))[0] +
log(det(S) / det(Sigma))
) / 2
rhs = (
((mu1-m1)**2/s1**2 - 2*r*(mu1-m1)*(mu2-m2)/(s1*s2) + (mu2-m2)**2/s2**2) /
(2 * (1 - r**2)) +
((sigma1**2-s1**2)/s1**2 - 2*r*(rho*sigma1*sigma2-r*s1*s2)/(s1*s2) +
(sigma2**2-s2**2)/s2**2) /
(2 * (1 - r**2)) +
log((s1**2 * s2**2 * (1-r**2)) / (sigma1**2 * sigma2**2 * (1-rho**2))) / 2
)
simplify(lhs - rhs) == 0
Best Answer
The bivariate normal distribution is the exception, not the rule!
It is important to recognize that "almost all" joint distributions with normal marginals are not the bivariate normal distribution. That is, the common viewpoint that joint distributions with normal marginals that are not the bivariate normal are somehow "pathological", is a bit misguided.
Certainly, the multivariate normal is extremely important due to its stability under linear transformations, and so receives the bulk of attention in applications.
Examples
It is useful to start with some examples. The figure below contains heatmaps of six bivariate distributions, all of which have standard normal marginals. The left and middle ones in the top row are bivariate normals, the remaining ones are not (as should be apparent). They're described further below.
The bare bones of copulas
Properties of dependence are often efficiently analyzed using copulas. A bivariate copula is just a fancy name for a probability distribution on the unit square $[0,1]^2$ with uniform marginals.
Suppose $C(u,v)$ is a bivariate copula. Then, immediately from the above, we know that $C(u,v) \geq 0$, $C(u,1) = u$ and $C(1,v) = v$, for example.
We can construct bivariate random variables on the Euclidean plane with prespecified marginals by a simple transformation of a bivariate copula. Let $F_1$ and $F_2$ be prescribed marginal distributions for a pair of random variables $(X,Y)$. Then, if $C(u,v)$ is a bivariate copula, $$ F(x,y) = C(F_1(x), F_2(y)) $$ is a bivariate distribution function with marginals $F_1$ and $F_2$. To see this last fact, just note that $$ \renewcommand{\Pr}{\mathbb P} \Pr(X \leq x) = \Pr(X \leq x, Y < \infty) = C(F_1(x), F_2(\infty)) = C(F_1(x),1) = F_1(x) \>. $$ The same argument works for $F_2$.
For continuous $F_1$ and $F_2$, Sklar's theorem asserts a converse implying uniqueness. That is, given a bivariate distribution $F(x,y)$ with continuous marginals $F_1$, $F_2$, the corresponding copula is unique (on the appropriate range space).
The bivariate normal is exceptional
Sklar's theorem tells us (essentially) that there is only one copula that produces the bivariate normal distribution. This is, aptly named, the Gaussian copula which has density on $[0,1]^2$ $$ c_\rho(u,v) := \frac{\partial^2}{\partial u \, \partial v} C_\rho(u,v) = \frac{\varphi_{2,\rho}(\Phi^{-1}(u),\Phi^{-1}(v))}{\varphi(\Phi^{-1}(u)) \varphi(\Phi^{-1}(v))} \>, $$ where the numerator is the bivariate normal distribution with correlation $\rho$ evaluated at $\Phi^{-1}(u)$ and $\Phi^{-1}(v)$.
But, there are lots of other copulas and all of them will give a bivariate distribution with normal marginals which is not the bivariate normal by using the transformation described in the previous section.
Some details on the examples
Note that if $C(u,v)$ is am arbitrary copula with density $c(u,v)$, the corresponding bivariate density with standard normal marginals under the transformation $F(x,y) = C(\Phi(x),\Phi(y))$ is $$ f(x,y) = \varphi(x) \varphi(y) c(\Phi(x), \Phi(y)) \> . $$
Note that by applying the Gaussian copula in the above equation, we recover the bivariate normal density. But, for any other choice of $c(u,v)$, we will not.
The examples in the figure were constructed as follows (going across each row, one column at a time):