[Math] Generating multivariate normal samples – why Cholesky

cholesky decompositionlinear algebranumerical methodsprobability

Hello everyone and happy new year! May all your hopes and aspirations come true and the forces of evil be confused and disoriented on the way to your house.

With that out of the way…

I am trying to write a computer code that gets a vector $\mu \in R^n $ and matrix $\Sigma \in \mathbb R^{n \times n}$ and generates random samples from the multivariate normal distribution with mean $\mu$ and covariance $\Sigma$.

The problem: I am only allowed to use the program to sample from the single variable normal distribution with mean $0$ and variance $1$: $N(0, 1)$.

The proposed solution: Define a vector of zeros (initially) $v \in \mathbb R^n$, now for all $i$ from $1$ to $n$, draw from a single variable normal dist: $v_i \overset{}{\sim} N(0, 1)$.

Now do a Cholesky decomposition on $\Sigma$: $\Sigma = LL^T$.

Now finally the random vector we want that is distributed from the multivariate gaussian is $Lv + \mu$.

My question is why? I don't understand the intuition, if it was a single dimensional distribution $N(\mu, \sigma^2)$ then I understand why $\sigma ^2 v + \mu$ is a good idea, so why cholesky? Wouldn't we want $\Sigma v + \mu$?

Best Answer

After the comment of Rahul you understood that in any parametrization $x=Av+μ$ you will need that $$ Σ=\Bbb E(x-μ)(x-μ)^T=A·\Bbb E(vv^T)·A^T=AA^T. $$ There are infinitely many possibilities to chose $A$, with any orthogonal matrix $Q$ also $\tilde A=AQ$ satisfies that condition.

One could even chose the square root of $Σ$ (which exists and is unique among the s.p.d. matrices).

The advantage of using the Cholesky factorization is that you have a cheap and easy algorithm to compute it.

Related Question