[Math] Compressing random numbers

random

I've been thinking about ways to compress the output from a (supposedly) random number generator. Let's assume for a moment that my computer can produce high-quality random numbers. I'm certainly not an expert in this field, so please correct me wherever.

Let's say I need random numbers between zero and 199 inclusive, however I can only read a minimum of a byte at a time from the RNG, so I use some compression function to reduce the 256 possible values of the byte to 200 different values. I'm considering using the modulo operation as a compression function, in that the modulo operation will cause a wraparound of the value should it exceed 199.

I think I've immediately spotted a flaw with using modulo in that it's twice as likely for the values 0-55 to occur. Would you say this assessment of the modulo operation is correct, or is there something about the properties of random numbers (entropy or whatever) that means that this doesn't matter? Also, if not modulo, could you suggest a good method of reducing the number of possible values of a RNG which effectively preserves their 'randomness'?

Best Answer

So if I interpret your question correctly, you've got an uniformly distributed random natural number in the range $[0,256)$ and want to obtain an uniformly random number in the range $[0,200)$.

As you've observed, just taking the number modulo $200$ doesn't work, since the numbers below $56$ will be obtained twice as often that way.

The simplest way to do it is to simply throw away anything $\ge 200$ and try again. If you consider that too wasteful, you can also store the excess value and use it for the next try. So a less wasteful algorithm would be (where drawn numbers are always single-byte numbers):

  • First, you draw a random number $r_1$.
  • If $r_1 < 200$, you just return it.
  • Otherwise you draw a second number $r_2$ and calculate $s = 256*(r_1 - 200) + r_2$. Now $s$ is uniformly distributed in the integers of the range $[0,14336)$.
  • If $s < 14200$, you return $s \bmod 200$. Otherwise, you start over.

As you see, the probability of not succeeding after the second draw is extremely low. You could, of course, repeat this process instead of starting over (the remainder here is in the range $[0,136)$ so the chance in the third try is even larger).

Also, when returning a number from the second step, you're wasting the quotient part (in the range $[0,72)$; you might want to save that for later trials.

Related Question