[Math] Generating multivariate Gaussian samples–Why does it work

normal distributionproof-explanationsampling

I came across the method for generating multivariate normal samples on wikipedia:
https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Drawing_values_from_the_distribution

A widely used method for drawing (sampling) a random vector $X$ from the $N$-dimensional multivariate normal distribution with mean vector $μ$ and covariance matrix $Σ$ works as follows:[28]

  1. Find any real matrix $A$ such that $AA^T = Σ$. When $Σ$ is positive-definite, the Cholesky decomposition is typically used, and the extended form of this decomposition can always be used (as the covariance matrix may be only positive semi-definite) in both cases a suitable matrix $A$ is obtained. An alternative is to use the matrix $A = UΛ^½$ obtained from a spectral decomposition $Σ = UΛU^T$ of $Σ$. The former approach is more computationally straightforward but the matrices $A$ change for different orderings of the elements of the random vector, while the latter approach gives matrices that are related by simple re-orderings. In theory both approaches give equally good ways of determining a suitable matrix $A$, but there are differences in computation time.

  2. Let $Z = (z_1, …, z_N)^T$ be a vector whose components are $N$ independent standard normal variates (which can be generated, for example, by using the Box–Muller transform).

  3. Let $X$ be $μ + AZ$. This has the desired distribution due to the affine transformation property.

Why does the cholesky decomposition matrix '$A$' multiplied by the vector of samples chosen from the standard normal distribution '$Z$' plus '$μ$' give us our result (ie $X = μ + AZ$)?

Why does this work? What is the proof?

Best Answer

Simply take the vector you have generated, $\boldsymbol{x}$, and compute its covariance: $$\mathbb E[(\boldsymbol{x}-\boldsymbol{\mu})(\boldsymbol{x}-\boldsymbol{\mu})^T] = \mathbb E[\boldsymbol A\boldsymbol z\boldsymbol z^t\boldsymbol A^t] = \boldsymbol A\mathbb E[\boldsymbol z\boldsymbol z^t]\boldsymbol A^t = \boldsymbol A \boldsymbol I \boldsymbol A = \boldsymbol A \boldsymbol A^T = \boldsymbol\Sigma,$$ as desired.

Related Question