[Math] Proof of the Box-Muller method

probability theoryproof-verification

This is Exercise 2.2.2 from Achim Klenke: »Probability Theory — A Comprehensive Course«.

Exercise (Box–Muller method): Let $U$ and $V$ be independent random variables that are uniformly distributed on $[0,1]$. Define
$$X := \sqrt{−2\log(U)}\, \cos(2\pi V) \quad \text{and} \quad Y := \sqrt{−2\log(U)}\, \sin(2\pi V)\, .$$

Show that $X$ and $Y$ are independent and $\mathcal{N}_{0,1}$-distributed.

Solution:
Define random variable $R:= \sqrt{-2\log(U)}$, then
\begin{align*}
\mathbf{P}\bigl[R \leq r\bigr] & = \mathbf{P}\bigl[-2 \log(U) \leq r^2\bigr] = \\
& = \mathbf{P}\bigl[\log(U) \geq -\frac{r^2}{2}\bigr] = \\
& = 1 – \mathbf{P}\biggl[U < \exp\Bigl(-\frac{r^2}{2}\Bigr)\biggr]\, .
\end{align*}
$U$ is uniformly defined on $[0, 1]$, so the distribution of $R$ is $$\mathbf{P}[R\leq r] = 1 – \int_0^{\exp(-r^2/2)} \, dt = 1 – \exp\Bigl(-\frac{r^2}{2}\Bigr)\, .$$
For the density of $R$ we get: $f_R(t) = \exp\Bigl(-\frac{t^2}{2}\Bigr)\cdot t$ with $t> 0$.

We also define the random variable $\Phi := 2\pi V$. Since $V$ is uniformly distributed on $[0, 1]$, $f_\Phi(t) = \frac{1}{2\pi}$ with $0< t \leq 2\pi$.

Since $U, V$ are independent, $R, \Phi$ must also be independent and $$f_{R, \Phi}(t_1, t_2) = f_R(t_1) f_\Phi(t_2) = \frac{1}{2 \pi} \exp\Bigl(-\frac{t_1^2}{2}\Bigr)\cdot t_1 \, .$$

With \begin{align*}
g\colon (0,\infty)\times(0, 2\pi] &\rightarrow \mathbb{R}^2 \\
(r, \phi) &\mapsto \bigl(r \cos(\phi), r \sin(\phi)\bigr)
\end{align*} we see that
$$(X, Y) = g(R, \Phi)\, ,$$ so we want to find the image measure
$$\mathbf{P}_{X, Y} = \mathbf{P}_{R, \Phi}\circ g^{-1}\, .$$

We use the transformation formula for densities:
$$ f_{X, Y}(\tau_1, \tau_2) = \frac{f_{R, \Phi}(g^{-1}(\tau_1, \tau_2))}{|\det(g'(g^{-1}(\tau_1, \tau_2)))|}$$

$g$ is just the transformation for polar coordinates. With
$$ t_1 = \sqrt{\tau_1^2 + \tau_2^2} = |\det(g'(g^{-1}(\tau_1, \tau_2)))|$$
we finally get
$$f_{X, Y}(\tau_1, \tau_2) = \frac{1}{2 \pi} \exp\Bigl(-\frac{\tau_1^2 + \tau_2^2}{2}\Bigr) = \underbrace{\frac{1}{\sqrt{2 \pi}} \exp\Bigl(-\frac{\tau_1^2}{2}\Bigr)}_{=f_X(\tau_1)} \cdot \underbrace{\frac{1}{\sqrt{2 \pi}} \exp\Bigl(-\frac{\tau_2^2}{2}\Bigr)}_{=f_Y(\tau_2)}\, ,$$
that is: $X, Y$ are $\mathcal{N}_{0, 1}$-distributed and independent. $\square$


Could you please check my proof? I'm sorry that it's so long — it seems right to me, but I'm self-studying and really need to catch any eventual mistakes…
Thank you!

Best Answer

I am physicist and I can prove it faster (as a physicist ) intuitive derivation by using https://en.wikipedia.org/wiki/Inverse_transform_sampling

Imagine $2D$ Gaussian copula in polar coordinates $(\phi,r)$ representing $2D$ probability density in plane. (The copula is symmetric around the $Z-$axis.) $\rho(r)=A\exp(-\frac12r^2))$ where $A$ is a scaling parameter, such that $$\int_ 0^\infty 2\pi r \rho(r) dr = 1.$$(you can easily calculate the integral ; $2\pi r dr$ is area between two concentric close circles in $x,y$ plane) You get it from linear Gauss density function in Cartesian coordinates: $\rho(r)=\rho_1(x) \rho_1(y)$ where $\rho_1(x)=\sqrt{A}\exp(-\frac12x^2))$ (using Pythagoras theorem or euclidean metric as well)

Let's generate random $3D$ points uniformly under the the Gauss copula. Projecting the uniformly distributed samples in volume $(X,Y,Z)$ or $(r,\phi,z)$ into the base plane $(X,Y)$ or $(r,\phi)$ provides the Gaussian distributed samples in the $2D$ plane. Projecting those samples into any direction (for example $X$ or $Y$ axis) we get the chosen Gaussian distribution.

Think of the following step or mathematical construction. Split the whole volume of unit 1 under the Gaussian copula into small objects of the same volume of $dV=dr .(r*d\phi) dh$ and attach an index label to them from $1$ into $N$, where $N$ is around $\frac{1}{dV}$, $dr$ is radial size,$r*d\phi$ is tangential size (perpendicular to $dr$) and $dh$ is height in $Z$ axis.Note that for the constant volume the $r*d\phi$ has to be constant, the further from origin the larger r and the smaller $d\phi$ gets to keep this tangential dimension constant. Now you can generate uniformly integer numbers in range $i=1 \ldots N$ (or real numbers from interval $<0,1>$ multiplied by $N$ to pick the uniformly distributed index representing the uniformly distributed location of infinitesimal volume $dV_i$. This is a random uniform volume sampling. For the purists to get the true random points sampling you can combine it with random triplet $(r_1,\rho_1,h1)$ to pick the random point within the random infinitesimal volume $dV_i$. We demonstrated that uniform volume sampling leads to the Gaussian distribution of the projected samples. We choose any indexing system of infinitesimal volumes $dV$ with the only constraint: Let $dV_i$ has $r_i$ coordinate and $dV_j$ has $r_j$ coordinate. If $r_i<r_j$ then $i<j$. Indexing is growing with $r$. We leave the indexing within the wall of cylinder (volumes with the same R) unspecified on purpose.

We will generate uniformly a real number $P$ from $(0,1)$ and when multiplied by $N$ that will pick a volume $dV_i$ somewhere inside a cylinder wall (of $dr$ thickness) with radius $R$ and height $\rho(R)$ . Probability or fraction $P$ of uniformly distributed random points under copula with $r$ closer than $R$ (inside the cylinder) is integral from $0$ to $R$ of the height $\rho(r)$ of copula at distance r from origin times thickness dr times circumference of cylinder ($2\pi r$). It is integral of this value

$P(r<R)=\int_0^R 2 \pi r \rho(r) dr$.

We can get easily (this is the reason why we work in polar coordinates) a closed formula of this integral by simple substitution $\frac12 r^2=p$ and $rdr=dp$

\begin{align} P&=2\pi A \int_0^{\frac12 R^2} \exp(-p) dp \\ &=2\pi A \left[-\exp(-p)\right]_0^{\frac12 R^2} \\ &=2\pi A [1-\exp(-\frac12 R^2)] \end{align}

Notice that when $R$ goes to $\infty$ we normalize the probability the integral goes to $1$ so we have $A=1/(2 \pi)$ ).
We solve for $R$:

$$R(P)=\sqrt{-2\ln(1-P)} $$

That is:An uniformly generated $P$ produces a properly distributed $R$ in polar coordinates. This is a general principle of inverse transform sampling.https://en.wikipedia.org/wiki/Inverse_transform_sampling You generate uniform samples in generalized volume or area under a distribution function and get location (projection) of sample. We know we should get uniformly distributed $\phi$ coordinate as well. Index $i$ generated above under fully defined specific indexing system (within cylindrical wall) should provide this uniformly distributed $\phi$ (due to symmetry) as well. However it is easier to neglect the details of indexing system within the cylinder wall and take advantage of symmetry of cupola around Z-axis. Due to rotational symmetry around Z-axis we can simply generate uniformly distributed $\phi \in (0,2\pi)$ . The pair

(R,$\phi$)=(R(P),2$\pi$*U)

where P and U are independent uniformly random distributed real values on interval (0,1) define sample points of 2D Gaussian distribution in polar coordinates. At the end we can do simple projection/transformation of those samples into Cartesian coordinates. We get the celebrated Gaussian distribution of random samples in X and Y axis independently.

$$x=R\cos(\phi) $$ $$y=R\sin(\phi)$$

In formula for $R(P)$ above notice that if $P$ is uniformly distributed on $(0,1)$, then $1-P$ is as well. You can replace $1-P$ with $P$ in the method.

$$R(P)=\sqrt{-2\ln(P)} $$

Related Question