Conditional distribution of jointly Gaussian random variables where one is degenerate

conditional-expectationexpected valuegaussiannormal distributionprobability theory

If random variables $x \in \mathbb R^n$ and $y \in \mathbb R^m$ have the joint Gaussian distribution
$$
\begin{bmatrix} x \\ y \end{bmatrix} \sim
\mathcal N \left(
\begin{bmatrix} a \\ b \end{bmatrix},
\begin{bmatrix} A & C \\ C^{\top} & B \end{bmatrix}
\right),
$$

then the conditional distributions of $x$ and $y$ are given as
$$
x \, \vert \, y \sim \mathcal N \left( a + C B^{-1} \left( y – b \right), A – C B^{-1} C^{\top} \right) \quad (*)
$$

provided that $B^{-1}$ exists.

Question: Suppose that we know two random vectors $x$ and $y$ are jointly Gaussian with mean and covariance as stated above. What can we say about the conditional distributions when $B^{-1}$ doesn't exist? I am mainly interested in the case where $B$ is restricted to be block diagonal with the submatrix in the upper-leftmost block being a $k \times k$ (with $k < m$) invertible matrix and there are zeros everywhere else (e.g. $B = \begin{bmatrix} \Sigma & \bf 0 \\ \bf 0 & \bf 0 \end{bmatrix}$ where $\Sigma^{-1}$ exists), but still would like to know if there is an answer when $B$ lacks this type of restriction. In this case, what can we say about the:

  1. shape? Can we still say that $x \,\vert\, y$ is Gaussian?
  2. center? Can we find an expression that represents the conditional mean $\mathbb{E} \left[ x \, \vert\, y \right]$?
  3. spread? Can we find an expression that represents the conditional variance $\text{Var} \left[ x \, \vert \, y \right]$?

My attempt:

Shape:
I heard the phrase "conditional distributions of jointly Gaussian random variables is Gaussian", so it seems reasonable to say that the conditional distribution $x \, \vert \, y$ is Gaussian, but degenerate in the sense that it doesn't have probability density function.

The comments to the question here seems suggests this is true.

Center:
When it comes to the conditional mean, I constructed a toy example.

Suppose $x \in \mathbb{R}$ and $y = (y_1, y_2, y_3)^{\top} \in \mathbb{R}^3$ such that $x \sim \mathcal{N}(1,1)$ and $y \,\vert\, x \sim \mathcal{N} (Ax, \Sigma)$ where
$$A = \begin{bmatrix} 1 \\ 2 \\ 0 \end{bmatrix}, \quad \Sigma = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix}.$$
Then
$$\begin{bmatrix} x \\ y_1 \\ y_2 \\ y_3 \end{bmatrix}
\sim \mathcal{N} \left( \begin{bmatrix} 1 \\ 1 \\ 2 \\ 0 \end{bmatrix},
\begin{bmatrix} 1 & 1 & 2 & 0 \\ 1 & 2 & 2 & 0 \\ 2 & 2 & 5 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix} \right)$$

$\mathbb{E} [x \, \vert \, y ]$ should be an affine transformation of $y_1, y_2, y_3$, so
$$\mathbb{E} [x \, \vert \, y ] = a_0 + a_1 (y_1 – 1) + a_2(y_2 – 2) + a_3 y_3.$$
Now, $x – \mathbb{E} [x \, \vert \, y ]$ and $y_i$ are independent (right?) for $i = 1, 2, 3$, so then $\text{Cov} (x – \mathbb{E} [x \, \vert \, y ], y_i) = 0$ for each $i$. Also, using the tower property, it follows that $\mathbb{E} \left[ \mathbb{E} [x \, \vert \, y ] \right] = a_0 = 1$.

Next,
\begin{align}
0 = \text{Cov} (x – \mathbb{E} [x \, \vert \, y ], y_i) &= \text{Cov} \left( x – \left( a_0 + a_1 (y_1 – 1) + a_2(y_2 – 2) + a_3 y_3 \right), y_i \right) \\
&= \text{Cov} \left( x, y_i \right) – a_0 \text{Cov} \left(1 ,y_i \right) – a_1 \text{Cov} \left( y_1, y_i \right) – a_2 \text{Cov} \left( y_2, y_i \right) – a_3 \text{Cov} \left( y_3, y_i \right) \\
&= \text{Cov} \left( x, y_i \right) – a_1 \text{Cov} \left( y_1, y_i \right) – a_2 \text{Cov} \left( y_2, y_i \right).
\end{align}

I think I can say that $y_3 = 0$, so this leads to the system
\begin{align}
1 – 2a_1 – 2a_2 &= 0 \\
2 – 2a_1 – 5a_2 &= 0
\end{align}

which has solution $(a_0, a_1, a_2)= \left( 1, \frac16, \frac13 \right)$ with $a_3$ being free. Therefore, $\mathbb{E} [x \, \vert \, y ] = \frac16 + \frac16 y_1 + \frac13 y_2$

The solution in this case is the same as $\mathbb{E} [x \,\vert\, y ]$ if $y = (y_1, y_2)^{\top}$, $A = (1, 2)^{\top}$, and $B = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}$.

So it seems that you'd just use $(*)$, but use the only the first $k$ columns of $C$ and the first $k \times k$ square of $B$.

Spread: To find the conditional variance in the toy example, I would need to find $\mathbb{E} \left[ x^2 \,\vert\, y \right]$, but I have not figured out a reasonable approach to compute this expectation. But, perhaps it is similar to the idea of using a restrict version of $C$ and $B$.

Best Answer

If a random vector $z$ has joint Gaussian distribution, possibly degenerate, and we partition $z$ into subvectors, say $z=\begin{pmatrix}x \\ y\end{pmatrix}$, then the conditional distribution of $x$ given $y$ also has Gaussian distribution. The conditional mean and variance are a function of the original mean vector and original covariance matrix. In the case where the original covariance matrix is singular, you will need to find a Moore-Penrose pseudoinverse. For the full derivation, please see this answer.

Related Question