Solved – Expressing conditional covariance matrix in terms of covariance matrix


Suppose we have two multivariate random variables $\mathbf{X}$ (of dimension $n_x$) and $\mathbf{Y}$ (of dimension $n_y$).
The covariance matrix $C_{X,Y}$ can be written as the following block-matrix form:
\Sigma_{11} & \Sigma_{12} \\
\Sigma_{21} & \Sigma_{22} \\
where $\Sigma_{11}$ is the covariance of $\mathbf{X}$.

According to here, the conditional covariance matrix $C_{Y|X}$ can be expressed as:


My question is: how to derive the equality?

Best Answer

This rule holds when the random variables are jointly normally distributed, but it does not apply more generally; i.e., for other joint distributions it might not hold. ​In a related answer here it is shown that the Mahanalobis distance can be decomposed as follows:

$$\begin{equation} \begin{aligned} D^2 (\boldsymbol{x}, \boldsymbol{y}) &= \begin{bmatrix} \boldsymbol{x} - \boldsymbol{\mu}_X \\ \boldsymbol{y} - \boldsymbol{\mu}_Y \end{bmatrix}^\text{T} \begin{bmatrix} \boldsymbol{\Sigma}_{XX} & \boldsymbol{\Sigma}_{XY} \\ \boldsymbol{\Sigma}_{YX} & \boldsymbol{\Sigma}_{YY} \end{bmatrix}^{-1} \begin{bmatrix} \boldsymbol{x} - \boldsymbol{\mu}_X \\ \boldsymbol{y} - \boldsymbol{\mu}_Y \end{bmatrix} \\[6pt] &= \underbrace{(\boldsymbol{y} - \boldsymbol{\mu}_{Y|X})^\text{T} \boldsymbol{\Sigma}_{Y|X}^{-1} (\boldsymbol{y} - \boldsymbol{\mu}_{Y|X})}_\text{Conditional Part} + \underbrace{(\boldsymbol{x} - \boldsymbol{\mu}_X)^\text{T} \boldsymbol{\Sigma}_{XX}^{-1} (\boldsymbol{x} - \boldsymbol{\mu}_X)}_\text{Marginal Part}, \\[6pt] \end{aligned} \end{equation}$$

where we use the conditional mean vector and conditional variance matrix:

$$\begin{align} \boldsymbol{\mu}_{Y|X} &\equiv \boldsymbol{\mu}_Y + \boldsymbol{\Sigma}_{YX} \boldsymbol{\Sigma}_{XX}^{-1} (\boldsymbol{x} - \boldsymbol{\mu}_X), \\[6pt] \boldsymbol{\Sigma}_{Y|X} \ &\equiv \boldsymbol{\Sigma}_{YY} - \boldsymbol{\Sigma}_{YX} \boldsymbol{\Sigma}_{XX}^{-1} \boldsymbol{\Sigma}_{XY}. \\[6pt] \end{align}$$

If the random vectors $\mathbf{X}$ and $\mathbf{Y}$ are jointly normally distributed, it follows that the conditional distribution of interest can be written as:

$$\begin{equation} \begin{aligned} p(\boldsymbol{y} | \boldsymbol{x}, \boldsymbol{\mu}, \boldsymbol{\Sigma}) &\overset{\boldsymbol{y}}{\propto} p(\boldsymbol{x} , \boldsymbol{y} | \boldsymbol{\mu}, \boldsymbol{\Sigma}) \\[12pt] &= \text{N}(\boldsymbol{x}, \boldsymbol{y} | \boldsymbol{\mu}, \boldsymbol{\Sigma}) \\[10pt] &\overset{\boldsymbol{y}}{\propto} \exp \Big( - \frac{1}{2} D^2 (\boldsymbol{x}, \boldsymbol{y}) \Big) \\[6pt] &\overset{\boldsymbol{y}}{\propto} \exp \Big( - \frac{1}{2} (\boldsymbol{y} - \boldsymbol{\mu}_{Y|X})^\text{T} \boldsymbol{\Sigma}_{Y|X}^{-1} (\boldsymbol{y} - \boldsymbol{\mu}_{Y|X}) \Big) \\[6pt] &\overset{\boldsymbol{y}}{\propto}\text{N}(\boldsymbol{y} | \boldsymbol{\mu}_{Y|X}, \boldsymbol{\Sigma}_{Y|X}), \\[6pt] \end{aligned} \end{equation}$$

which establishes that $\boldsymbol{\Sigma}_{Y|X}$ is the conditional covariance matrix. Note again that this result depends on the assumption that the random vectors are jointly normally distributed. It can be regarded as a "first-order" approximation to the conditional covariance in other cases.

Related Question