[Math] Compressing random numbers

random

I've been thinking about ways to compress the output from a (supposedly) random number generator. Let's assume for a moment that my computer can produce high-quality random numbers. I'm certainly not an expert in this field, so please correct me wherever.

Let's say I need random numbers between zero and 199 inclusive, however I can only read a minimum of a byte at a time from the RNG, so I use some compression function to reduce the 256 possible values of the byte to 200 different values. I'm considering using the modulo operation as a compression function, in that the modulo operation will cause a wraparound of the value should it exceed 199.

I think I've immediately spotted a flaw with using modulo in that it's twice as likely for the values 0-55 to occur. Would you say this assessment of the modulo operation is correct, or is there something about the properties of random numbers (entropy or whatever) that means that this doesn't matter? Also, if not modulo, could you suggest a good method of reducing the number of possible values of a RNG which effectively preserves their 'randomness'?

Best Answer

So if I interpret your question correctly, you've got an uniformly distributed random natural number in the range $[0,256)$ and want to obtain an uniformly random number in the range $[0,200)$.

As you've observed, just taking the number modulo $200$ doesn't work, since the numbers below $56$ will be obtained twice as often that way.

The simplest way to do it is to simply throw away anything $\ge 200$ and try again. If you consider that too wasteful, you can also store the excess value and use it for the next try. So a less wasteful algorithm would be (where drawn numbers are always single-byte numbers):

First, you draw a random number $r_1$.
If $r_1 < 200$, you just return it.
Otherwise you draw a second number $r_2$ and calculate $s = 256*(r_1 - 200) + r_2$. Now $s$ is uniformly distributed in the integers of the range $[0,14336)$.
If $s < 14200$, you return $s \bmod 200$. Otherwise, you start over.

As you see, the probability of not succeeding after the second draw is extremely low. You could, of course, repeat this process instead of starting over (the remainder here is in the range $[0,136)$ so the chance in the third try is even larger).

Also, when returning a number from the second step, you're wasting the quotient part (in the range $[0,72)$; you might want to save that for later trials.

Related Solutions

[Math] Simplest way to produce an even distribution of random values

There is a way that would not waste so much entropy in the random source, and also has optimal expected time per call. We wish to construct a universal RNG from a given RNG rand that outputs a uniformly random value in the range [0..k-1]. In your question k=256.

p=0;
q=1;
def rnd(int c):
    global p,q
    while True:
        if q>=c:
            if p<q-q%c:
                v=p%c;
                p=p//c;
                q=q//c;
                return v;
            else:
                p=p%c;
                q=q%c;
        r=rand();
        p=p*k+r;
        q=q*k;

How this works is that p,q represents that we have an unused random choice p in the integer range [0..q-1]. So we just call rand enough times to expand that range and our choice within it. If at any point our random choice is less than q-q%c (the largest multiple of c that is at most q), we can return p%c because it is equally likely to be any residue modulo c, and we then shrink the groups of size c each into an unused random choice. Otherwise we remove that multiple of c from both p and q. In the implementation above, note that the extra if q>=c is redundant, but may increase efficiency if c is large compared to k.

I have tested it and it achieves about 95% (entropy) efficiency for c=3 and about 90% efficiency for c=150.

After thinking a bit, I realized that I was wrong to claim that it is entropy optimal. The missing entropy goes into the choice between the two if cases. There is actually a way to fix this, but it is not simple to implement and when I implemented just one level it only improves the efficiency slightly, so it is quite pointless.

Best Answer

Related Solutions

[Math] Simplest way to produce an even distribution of random values

Related Question