Solved – Box Müller for generating random numbers

data transformationrandom-generation

I am generating random numbers using the Box Müller method in R. I am successfully ending up with two vectors $Z_1$ and $Z_2$ where both of them have elements that are normally distributed.

But what are the relation between them? Why do I need them both? How can my random number generation be better if I use both of them?

Best Answer

Box-Müller$^\dagger$ is simply a way of transforming a pair of independent standard uniforms to a pair of independent standard normals. The way the algorithm works you get two variates for two uniforms but it can't yield one for one (you can do a one-to-one transformation in other ways though, for example if you have a fast inverse normal cdf). One nice thing about the Box-Müller is that it's simple to understand and implement, whereas alternatives can be quite complex.

If you need an odd number of normals, its often easier to generate one extra and ignore it (unless you have a very convenient way of saving the spare one and using it next time -- this can often be done but unless you're only consuming them one at a time it is often not worthwhile). Depending on what algorithm you're comparing it with often you can get two this way more quickly than two generated one at a time in some other way.

Imagine you're going to buy black cotton socks. Maybe you can find one store that sells single socks for \$1 but in other places pairs of socks cost \$1.20. Now imagine you want 17 socks (say 8 pairs and one for a sockpuppet project). You could go to two stores and save 20 cents, or you could just buy 9 pairs and have a spare sock, saving yourself quite a deal of effort.

But what are the relation between them?

They're independent. (Well, notionally independent. With the old linear congruential rngs there was a particular pattern in the result - a "spiral galaxy" kind of appearance - that could on occasion be an issue. Because any pseudo-rng is not perfectly random they're only going to be as independent as the original uniforms were. Generally very, very close to independent, and good RNGs are tested every which-way for the kinds of dependence you might care about)

Why do I need them both?

You don't need to use both. If you only require one random number but you have two, it's okay to ignore the other.

How can my random number generation be better if I use both of them?

Because if you need say 100 million random numbers, it's much quicker to keep them all rather than to generate 200 million and then throw half of them away. But aside from speed it's no different to use the second one or to ignore it and generate another.

$\dagger$ the algorithm was actually invented by Marsaglia (just after WW2, I believe) but he couldn't publish it, as it was treated as a secret at the time

**these days it can be fairly quick but 60 years ago, it wasn't