Solved – Test the randomness (uniformly distributed) on a 64 bit float random generator

chi-squared-testrandom-generationuniform distribution

We have a random number generator which is supposed to generate 64 bit floats, uniformly.

We want to test whether it is a good uniformly random.

I am not asking the general way to test it, as it was asked here https://stackoverflow.com/questions/22916519/test-the-randomness-of-a-black-box-that-outputs-random-64-bit-floats.

As I understand, the first step for the test is to somehow divide the whole range into equal sizes like bins, then we keep generating and see the number of "balls" falling into each bin.

My question is about this first step.

How should I split the whole range of 64 bits float?

The range is between -1.79769313486231571e+308 and 1.79769313486231571e+308 and it is huge.

My idea is to utilise the 64 bits. Can I design the equal size bins like this:

Every float has 64 bits, so we have 64 bins.
For every float generated, we read all those bits, if a bit is 1, then we increased the according bin's number by 1
After N samplings, the expected number of each bin should be N/2.
Then we carry on pearson's chi-square test, etc.

Best Answer

As it stands, this is not a good way to test whether floating point numbers are uniformly distributed. Like Aksakal, I wondered about whether the bits of the exponent part of the floating point representation would be uniformly distributed. The answer to this is that they aren't uniformly distributed, because there are very many more numbers with large exponents than there are numbers with small exponents.

I wrote a small test program that confirms this. It generates $N = 1 \text{ million}$ uniformly distributed random floating point numbers, and as a control, $N$ random integers. (There were various problems generating 64 bit floating point numbers, see e.g. here, and 32 bits seems sufficient for demonstration purposes.)

First, the control case. The plot of the bins of bits for integers is just as you suggested, with each bin $\approx N/2$. enter image description here

Now for the floating point numbers. A plot of the sorted numbers is a straight line, indicating that they would pass the Kolmogorov–Smirnov test for uniformity. enter image description here

But the bins are definitely not uniform. enter image description here

If you plot only bins 1 to 23 together with bin 32, you do get bins $\approx N/2$, but bins 24 to 31 show a clear increasing pattern. These bits corresponds precisely with the bits for the exponent in 32 bit floating point numbers. The IEEE single precision floating point definition stipulates

the least significant 23 bits are for the mantissa
the next 8 bits are for the exponent
the most significant bit is for the sign

Another way to see this is to consider a simpler example. Think about generating numbers in base 10 between 0 and $10^7$, with a base 10 exponent. Numbers between 0 and 1 would have an exponent of 0. Numbers between 1 and 10 would have an exponent of 1, numbers between 10 and 100 an exponent of 2, ..., and numbers between $10^6$ and $10^7$ an exponent of 7. The numbers $10^4$ to $10^7$ are $(10^7-10^4)/10^7=99.9\%$ of the range and in binary their exponents range from 001 to 111, so you'd expect the most significant bit to occur 99.9% of the time, not 50% of the time.

It would be possible, with some care, to use an approach like this to get the expected frequencies for each bin in the binary exponent of a floating point number, and use this in a $\chi^2$ test, but Kolmogorov–Smirnov is a better approach in theory and easy to implement. Nevertheless a test like this could pick up distributional biases in the implementation of a random number generation that Kolmogorov–Smirnov might not. For example, when I first tried generating 64 bit double precision floating point random numbers in C++, I forgot to change to a 64 bit Mersenne Twister engine. The sorted numbers give a straight line plot, but you can see from the plots of the bins of the bits that the 64 bit Mersenne Twister engine is superior to the 32 bit one (as you would expect).

enter image description here

(Note in both cases that the last bit, the sign bit, is zero, due to the difficulties of generating random numbers across the whole range.)

Best Answer

Related Solutions

Solved – Taking many uniform random samples from very skewed distribution

Related Question