I'm a software engineer, working on a small randomizer library as part of a larger project.
We're using a cryptographic random number generator, which provides an array of random bytes.
We have to use these random bytes to produce an array of random integers fitting whatever requirements are specified.
For example, let's say someone requests $5$ random 8-bit unsigned integers between $50$ and $200$.
The value $50$ would be assigned to the variable $min$, and $200$ assigned to $max$.
Our generator then produces an array of $5$ random bytes, with values ranging from $0$ to $255$.
The most obvious method for converting each random byte $(n)$ into the desired range, would be:$$min+mod(n,(max-min)+1)$$… where $mod$ is the modulus or modulo operation.
This would convert each random byte $(n)$ into a random integer between $min$ and $max$.
The problem with this solution is:
It doesn't produce an even distribution, because each $n$ is a random integer between $0$ and $255$.
Therefore, in cases where $n>max-min$, the distribution overlaps unevenly with itself.
In the example above, the result is twice as likely to be between $50$ and $154$, as opposed to results between $155$ and $200$.
We need the random distribution to be even across the requested range ($50$ to $200$ in this example).
What's the simplest way to achieve this?
More complicated operations, such as logarithms, will cause a severe drain on performance.
So we'd like to stay within the realm of simple arithmetic, if at all possible.
For bytes where $n>max-min$, could we subtract $(max-min)$ from $n$, and then add resulting difference to the next byte in the array?
This is a possible solution I'm considering – but I'm confused about how it would work.
Are there any pitfalls or nuances that apply here?
How would this type of solution work?
Are there any other solutions that would provide a consistent, even random distribution without draining performance?
Best Answer
There is a way that would not waste so much entropy in the random source, and also has optimal expected time per call. We wish to construct a universal RNG from a given RNG
rand
that outputs a uniformly random value in the range[0..k-1]
. In your questionk=256
.How this works is that
p,q
represents that we have an unused random choicep
in the integer range[0..q-1]
. So we just callrand
enough times to expand that range and our choice within it. If at any point our random choice is less thanq-q%c
(the largest multiple ofc
that is at mostq
), we can returnp%c
because it is equally likely to be any residue moduloc
, and we then shrink the groups of sizec
each into an unused random choice. Otherwise we remove that multiple ofc
from bothp
andq
. In the implementation above, note that the extraif q>=c
is redundant, but may increase efficiency ifc
is large compared tok
.I have tested it and it achieves about 95% (entropy) efficiency for
c=3
and about 90% efficiency forc=150
.After thinking a bit, I realized that I was wrong to claim that it is entropy optimal. The missing entropy goes into the choice between the two
if
cases. There is actually a way to fix this, but it is not simple to implement and when I implemented just one level it only improves the efficiency slightly, so it is quite pointless.