[Math] Proof of the affine property of normal distribution for a landscape matrix

normal distributionprobability distributionstransformation

The widely used/mentioned/assumed affine property of multivariate normal distributions says that:

Given a random vector $x \in R^N$ with a multivariate normal distribution — $x \sim N_x(\mu_x, \Sigma_x)$ — then the random vector $y = Ax + b$ obtained by applying an affine/linear transformation to $x$ also has a normal distribution –> $y \sim N_y(A\mu_x+b, A\Sigma_x A^T)$

The above property is easy to prove if $A$ is an $N \times N$ matrix by writing $x = A^{-1}(y-b)$ and substituting it into $N_x(\mu, \Sigma)$ as shown below:

\begin{aligned}
p_y(y) & \propto p_x(A^{-1}(y-b)) \\
& \propto exp\{-0.5 \times (A^{-1}(y-b)-\mu_x)^T\Sigma_x^{-1}(A^{-1}(y-b)-\mu_x)\}\\
& = exp\{-0.5 \times (y – (A\mu_x + b))^T A^{-T}\Sigma_x ^{-1}A(y – (A\mu_x + b))\}\\
& = exp\{-0.5 \times (y – (A\mu_x + b))^T (A \Sigma_x A^T)^{-1}(y – (A\mu_x + b))\}\\
&\sim N_y(A\mu_x+b, A \Sigma_x A^T)
\end{aligned}

My questions are the following:

  1. Does the affine propert hold true even if A is a landscape $M \times N$ matrix with $M < N$ ? (most textbooks/lecture-notes say so and many papers assume this before deriving other things)
  2. If the affine property is true, how do you prove it? because when A is a landscape $M \times N$ matrix with $M < N$ you cannot compute A^{-1} and hence you cannot express the random vector $x$ as $x = A^{-1}(y-b)$

Best Answer

One should approach this through characteristic functions. Recall that $X$ is normal $N(\mu,\Sigma)$ if and only if, for every deterministic vector $t$ of size $N\times1$, $$ E(\mathrm e^{\mathrm it'X})=\mathrm e^{\mathrm it'\mu-t'\Sigma t/2}, $$ where $t'$ denotes the transpose of $t$. For every $(A,b)$ of compatible sizes, if $Y=AX+b$, one gets $$ E(\mathrm e^{\mathrm it'Y})=\mathrm e^{\mathrm it'b}E(\mathrm e^{\mathrm it'AX})=\mathrm e^{\mathrm it'b}E(\mathrm e^{\mathrm is'X}), $$ where $s'=t'A$. Since $s=(t'A)'=A't$, one sees that $$ E(\mathrm e^{\mathrm it'Y})=\mathrm e^{\mathrm it'b}\mathrm e^{\mathrm is'\mu-s'\Sigma s/2}=\mathrm e^{\mathrm it'b+\mathrm it'A\mu-t'A\Sigma A't/2}. $$ The RHS is the characteristic function of the normal distribution $N(b+A\mu,A\Sigma A')$, and this is enough to identify the distribution of $Y$ as such. Note that no inverse or pseudo-inverse is involved and that this applies to $Y$ of any dimension.

Related Question