Solved – Conceptual proof that conditional of a multivariate Gaussian is multivariate Gaussian

conditional probabilitymultivariate normal distributionnormal distributionproof

I understand the arithmetic derivation of the PDF of a conditional distribution of a multivariate Gaussian, as explained here, for example. Does anyone know of a more conceptual (perhaps, co-ordinate free) proof of the same result, perhaps one that uses characterising properties of the Gaussian?

Best Answer

A multivariate Gaussian (or Normal) random variable $X=(X_1,X_2,\ldots,X_n)$ can be defined as an affine transformation of a tuple of independent standard Normal variates $Z=(Z_1,Z_2,\ldots, Z_m)$. This easily implies the desired result, because when we condition $X$, we impose linear constraints among the $Z_j$. (If this is not obvious, please read on through the details.) This merely reduces the number of "free" $Z_j$ contributing to the variation among the $X_i$--but those $X_i$ nevertheless remain affine combinations of independent standard Normals, QED.

We can obtain this result in three steps of increasing generality. First, the distribution of $X$ conditional on its first component is Normal. Second, this implies the distribution of $X$ conditional on some linear constraint $C^\prime X = d$ is Normal. Finally, that implies the distribution of $X$ conditional on any finite set of $r$ such linear constraints is Normal.


Details

By definition,

$$X = \mathbb{A} Z + B$$

for some $n\times m$ matrix $\mathbb{A} = (a_{ij})$ and $n$-vector $B = (b_1, b_2, \ldots, b_n)$. Because one affine followed by another is still an affine transformation, notice that any affine transformation of $X$ is therefore also Normal. This fact will be used repeatedly.

Fix a number $x_1$ in order to consider the distribution of $X$ conditional on $X_1=x_1$. Replacing $X_1$ by its definition produces

$$x_1 = X_1 = b_1 + a_{11}Z_1 + a_{12}Z_2 + \cdots + a_{1m}Z_m.$$

When all the $a_{1j}=0$, the two cases $x_1=b_1$ and $x_1\ne b_1$ are easy to dispose of, so let's move on to the alternative where, for at least one index $k$, $a_{1k}\ne 0$. Solving for $Z_k$ exhibits it as an affine combination of the remaining $Z_j,\, j\ne k$:

$$Z_k = \frac{1}{a_{1k}}\left(x_1 - b_1 - (a_{11}Z_1 + \cdots + a_{1,k-1} + a_{1,k+1} + \cdots + a_{1m}Z_m)\right).$$

Plugging this in to $\mathbb{A}Z + B$ produces an affine combination of the remaining $Z_j$, explicitly exhibiting the conditional distribution of $X$ as an affine combination of $m-1$ independent standard normal variates, whence the conditional distribution is Normal.

Now consider any vector $C=(c_1, c_2, \ldots, c_n)$ and another constant $d$. To obtain the conditional distribution of $X$ given $C^\prime X = d$, construct the $n+1$-vector

$$Y = (Y_1,Y_2,\ldots, Y_{n+1})=(C^\prime X, X_1, X_2, \ldots, X_n) + (d, b_1, b_2, \ldots, b_n).$$

It is an affine combination of the same $Z_j$: the matrix $\mathbb{A}$ is row-augmented (at the top) by $C^\prime \mathbb{A}$ (an $n+1\times m$ matrix) and the vector of means $B$ is augmented at the beginning by the constant $d$. Therefore, by definition, $Y$ is multivariate Normal. Applying the preceding result to $Y$ and $d$ immediately shows that $Y$, conditional on $Y_1 = d$, is multivariate Normal. Upon ignoring the first component of $Y$ (which is an affine transformation!), that is precisely the distribution of $X$ conditional on $C^\prime X = d$.

The distribution of $X$ conditional on $\mathbb{C}X = D$ for an $r\times n$ matrix $\mathbb{C}$ and an $r$-vector $D$ is obtained inductively by applying the preceding construction one term at a time (working row-by-row through $\mathbb{C}$ and component-by-component through $D$). The conditionals are Normal at every step, whence the final conditional distribution is Normal, too.