Jointly Gaussian random vectors

probabilityprobability distributionsrandom matricesrandom variables

$\newcommand{cov}{\operatorname{cov}}$Suppose two scalar valued random variables $X$ and $Y$ are jointly Gaussian. We then have the joint density $$f_{X, Y}(x, y) = \frac{1}{2 \pi \sqrt{|K|}} \exp \left\{ -\frac{1}{2} \begin{bmatrix}
x – \mu_X\\
y – \mu_Y
\end{bmatrix}^T K^{-1} \begin{bmatrix}
x – \mu_X\\
y – \mu_Y
\end{bmatrix} \right\} $$

where $K = \begin{bmatrix}
\cov(X, X) & \cov(X, Y)\\
\cov(Y, X) & \cov(Y, Y)
\end{bmatrix}$
.

Now how do we write the joint density when $X \in \mathbb{R}^m$ and $Y \in \mathbb{R}^n$ are random vectors.

Does $K$ become a third order tensor? And what goes in the exponential power?

I think $K \in \mathbb{R}^{m \times n}$ where $K_{ij} = \cov(X_i, Y_j)$ even though strict analogy with the scalar case would suggest something like $K \in \mathbb{R}^{2 \times m \times n}$

Best Answer

You state "We then have the joint density" etc. But that neglects the case where $K$ is singular. However, that is not essential to the question.

Perhaps normality is not essential to the question either.

If $X,Y$ are random variables that take values is $\mathbb R^m$ and $\mathbb R^n$ respectively, then $$ \left[ \begin{array}{c} X \\ Y \end{array} \right] \in \mathbb R^{m+n} $$ and one can write \begin{align} K & = \operatorname E\left( \left(\left[ \begin{array}{c} X \\ Y \end{array} \right] - \left[ \begin{array}{c} \mu_X \\ \mu_Y \end{array} \right] \right)\left( \left[ \begin{array}{cc} X^\top, & Y^\top \end{array} \right] - \left[ \begin{array}{cc} \mu_X^\top, & \mu_Y^\top \end{array} \right] \right) \right) \\[12pt] & = \left[ \begin{array}{cc} \operatorname E((X-\mu_X)(X-\mu_X)^\top & \operatorname E((X-\mu_X)(Y-\mu_Y)^\top) \\ \operatorname E((Y-\mu_Y)(X-\mu_X)^\top) & \operatorname E((Y-\mu_Y)(Y-\mu_Y)^\top) \end{array} \right] \\[10pt] & \in \mathbb R^{(m+n)\times(m+n)}. \end{align}

One also writes $$ \operatorname{cov}(X,Y) = \operatorname E((X-\mu_X)(Y-\mu_Y)^\top) \in \mathbb R^{m\times n} $$ and then one has $$ \operatorname{cov}(X,Y) = \big( \operatorname{cov}(Y,X)\big)^\top, $$ i.e., unlike in the scalar-valued case, the covariances with the arguments interchanged are not equal to each other, but are transposes of each other.