Normal distribution conditional on a sub-affine space

conditional-expectationnormal distributionprobabilityprobability theory

This is related to this problem. Now let's for the time being keep aside the unorthodox "probabilistic" statements and focus on the canonical deterministic part only. Suppose $w\sim N(\mu, \Sigma)$ is multivariate normal, and let $P$ be a known matrix, not necessarily square or invertible, and $q$ be a known vector such that $Pw=q$ has solutions, then what is the conditional distribution $w\mid \{Pw=q\}$ or $\Bbb E(w\mid Pw=q)$?

In the original Black-Litterman paper (p35) the authors claimed that the conditional is again normal and that
$$\Bbb E(w\mid Pw=q) = \mu + \Sigma P^T (P\Sigma P^T)^{-1} (q – P\mu)$$
and can be obtained by solving the following optimisation problem
$$
\begin{align}
\min \quad & (x – \mu)^T\Sigma^{-1}(x-\mu)\\
\text{s.t.}\quad & Px=q
\end{align}
$$

Is their claim valid? And would you mind elaborating a bit on why it works? Thanks!

Best Answer

Let $W \sim \mathcal{N}(\mu, \Sigma)$ and write $W = \mu + \Sigma^{1/2}Z$, where $\Sigma^{1/2}$ is the unique positive-definite square root of $\Sigma$ that commutes with $\Sigma$. Then $Z \sim \mathcal{N}(0, I)$. Now define $A, B, Q$ as

$$ A = P\Sigma^{1/2}, \qquad B = A^{T}(AA^{T})^{-1}, \qquad Q = BA = A^{T}(AA^{T})^{-1}A. $$

and notice that

  1. $Q$ is the orthogonal projection onto $\ker(A)^{\perp}$,
  2. $I-Q$ is the orthogonal projection onto $\ker(A)$.

Now decomposing $Z$ into the sum of $Z_{\perp} = QZ$ and $Z_{||} = (I-Q)Z$, they are uncorrelated normal vectors and hence independent. From this, the conditioning equation $q = PW$ becomes

$$ q = PW = P\mu + A Z = P\mu + A Z_{\perp}. $$

Multiplying $B$ to both sides and using $Q^2 = Q$ (which follows from the fact that $Q$ is an orthogonal projection), we obtain

$$ B(q-P\mu) = BAZ_{\perp} = Q^2 Z = Q Z = Z_{\perp}$$

and hence the condition $PW = q$ determines the value of $Z_{\perp}$. So

\begin{align*} (W \mid PW=q) &\stackrel{d}{=} (\mu + \Sigma^{1/2}(Z_{\perp} + Z_{||}) \mid Z_{\perp} = B(q-P\mu)) \\ &\stackrel{d}{=} \mu + \Sigma^{1/2}B(q-P\mu) + \Sigma^{1/2}(1-Q)Z. \tag{*} \end{align*}

The last line $\text{(*)}$ has several implications:

  1. $\text{(*)}$ is an affine transformation of $Z \sim \mathcal{N}(0, I)$, hence it is again normal with

    $$ (W \mid PW=q) \sim \mathcal{N}( \mu + \Sigma^{1/2}B(q-P\mu), \Sigma^{1/2}(1-Q)\Sigma^{1/2}). $$

    Plugging all the definitions, mean of the conditional distribution $(W \mid PW=q)$ simplifies to

    \begin{align*} \mathbb{E}[W \mid PW=q] &= \mu + \Sigma^{1/2}B(q-P\mu) \\ &= \mu + \Sigma P^{T} (P\Sigma P^{T})^{-1}(q-P\mu). \end{align*}

  2. If we write $S = \Sigma^{1/2}B = \Sigma P^{T} (P\Sigma P^{T})^{-1}$, then $\text{(*)}$ can be simplified to a formula which involves only known variables:

    $$ (W \mid PW=q) \stackrel{d}{=} Sq + (I - SP) W. $$

Related Question