[Math] Simplest way to produce an even distribution of random values

probability distributionsrandomuniform distribution

I'm a software engineer, working on a small randomizer library as part of a larger project.

We're using a cryptographic random number generator, which provides an array of random bytes.
We have to use these random bytes to produce an array of random integers fitting whatever requirements are specified.

For example, let's say someone requests $5$ random 8-bit unsigned integers between $50$ and $200$.
The value $50$ would be assigned to the variable $min$, and $200$ assigned to $max$.

Our generator then produces an array of $5$ random bytes, with values ranging from $0$ to $255$.

The most obvious method for converting each random byte $(n)$ into the desired range, would be:$$min+mod(n,(max-min)+1)$$… where $mod$ is the modulus or modulo operation.

This would convert each random byte $(n)$ into a random integer between $min$ and $max$.

The problem with this solution is:
It doesn't produce an even distribution, because each $n$ is a random integer between $0$ and $255$.
Therefore, in cases where $n>max-min$, the distribution overlaps unevenly with itself.
In the example above, the result is twice as likely to be between $50$ and $154$, as opposed to results between $155$ and $200$.

We need the random distribution to be even across the requested range ($50$ to $200$ in this example).

What's the simplest way to achieve this?

More complicated operations, such as logarithms, will cause a severe drain on performance.
So we'd like to stay within the realm of simple arithmetic, if at all possible.

For bytes where $n>max-min$, could we subtract $(max-min)$ from $n$, and then add resulting difference to the next byte in the array?
This is a possible solution I'm considering – but I'm confused about how it would work.
Are there any pitfalls or nuances that apply here?
How would this type of solution work?

Are there any other solutions that would provide a consistent, even random distribution without draining performance?

Best Answer

There is a way that would not waste so much entropy in the random source, and also has optimal expected time per call. We wish to construct a universal RNG from a given RNG rand that outputs a uniformly random value in the range [0..k-1]. In your question k=256.

p=0;
q=1;
def rnd(int c):
    global p,q
    while True:
        if q>=c:
            if p<q-q%c:
                v=p%c;
                p=p//c;
                q=q//c;
                return v;
            else:
                p=p%c;
                q=q%c;
        r=rand();
        p=p*k+r;
        q=q*k;

How this works is that p,q represents that we have an unused random choice p in the integer range [0..q-1]. So we just call rand enough times to expand that range and our choice within it. If at any point our random choice is less than q-q%c (the largest multiple of c that is at most q), we can return p%c because it is equally likely to be any residue modulo c, and we then shrink the groups of size c each into an unused random choice. Otherwise we remove that multiple of c from both p and q. In the implementation above, note that the extra if q>=c is redundant, but may increase efficiency if c is large compared to k.

I have tested it and it achieves about 95% (entropy) efficiency for c=3 and about 90% efficiency for c=150.

After thinking a bit, I realized that I was wrong to claim that it is entropy optimal. The missing entropy goes into the choice between the two if cases. There is actually a way to fix this, but it is not simple to implement and when I implemented just one level it only improves the efficiency slightly, so it is quite pointless.

Best Answer

Related Solutions

[Math] Compressing random numbers

[Math] Expected value of multiple random variables, uniform distribution

Related Question