Could the simple iterative algorithm for generating random numbers be favouring digits of value 0

algorithmic-randomnessalgorithmscomputer sciencerandomrandom variables

I have been creating a random number generator, using Python.

My script starts by creating a one-dimensional $1 \times N$ matrix of integers ranging from 0 and 9. We can denote this matrix as $A$. I am using a novel method for this, which arguably involves pure random numbers. So far this seems to be successful and an analysis of 4000 of these integers shows that each integer has approximately ~10% representation in $A$, which indicates (although does not fully prove) that my random 0-9 integer generator is fair. So let us assume that the integers are randomly generated and distributed.

Next, I created an iterative algorithm that attempts to add an extra layer of chaos and generate random numbers between 0 to 1, using the elements of $A$. The algorithm is as follows:

  • Start with a pre-specified integer seed $n$.
  • Define $N_1 = A_{1,n}$ and use $N_1$ to create the first decimal place of a new number between 0 and 1 (so starting with "0.$N_1$").
  • Define $S_1 = A_{1,n+1}$ and use $S_1$ as a step size to the next desired element: $A_{1,n+1+S_1}$.
  • Define $N_2 = A_{1,n+1+S_1}$ and use $N_2$ to create the second decimal place.
  • Define $S_2 = A_{1,n+2+S_1}$ and use $S_2$ as the next step size.
  • Define $N_3 = A_{1,n+S_2}$ as the third decimal place and so on.
  • Finish building the number after 10 decimal places, then begin building a new number (resuming from the same element of $A$).
  • If the $j$ value of any desired $A_{1,j}$ element at any time exceeds $N$ (which would result in an error), then redefine that $j$ as: $j_{new} = j_{old} – N$ and continue.

In plain English, this can be described as: "Make the first decimal place the $n^{th}$ element (where $n$ is the seed), skip ahead to the next element, skip ahead by the value of that element, make the second decimal place the new element you reach, skip ahead to the next element from there and so on. Wrap around to the beginning of the matrix if the final element is surpassed. Start a new number after 10 decimal places have been generated."

The result of this is a set of supposedly random numbers, between 0 to 1, each having been constructed as: $0. N_1 N_2 N_3 … N_{10}$.

In order to gain some insight into how random the generated set is, I created a script that monitors the abundances of individual 0-9 digits in the decimal places. It turns out that consistently ~20% of the digits are now 0, which is a disproportionately high representation (and I made sure to leave out the "0." starts of the numbers!).

Given that there was a previously balanced representation of 0 digits in the integer matrix $A$, what about the algorithm could have favoured 0 digits ending up in the final random number set? How can such a question be mathematically approached? If there is no clear bias induced by the algorithm I have described, then does it indicate that the integers of $A$ were perhaps not randomly distributed, after all?

Best Answer

Yes, your algorithm is generating "extra" $0$s because it makes $S_i$ serve as the "skip distance" to the next digit. So, if $S_i=0$ there is no skip, forcing the next digit to be $0:$ $$S_1 = A_{1,n+1}=0\quad\implies\quad N_2 = A_{1,n+1+S_1}=A_{1,n+1}=S_1=0,$$ etc. Thus, $$\begin{align} P(N_{i+1}=0)&=\underbrace{P(N_{i+1}=0\mid S_i=0)}_{1}\underbrace{P(S_i=0)}_{0.1}+\underbrace{P(N_{i+1}=0\mid S_i\ne 0)}_{0.1}\underbrace{P(S_i\ne 0)}_{0.9}=0.19.\\ \end{align}$$

(I also verified this by generating $10^8$ uniformly distributed random digits and applying your algorithm.)

Note: I think there's a typo where you wrote $N_3 = A_{1,n+S_2}$, which presumably should be $N_3 = A_{1,(n+2+S_1)+S_2};$ i.e., the position of $N_3$ is meant to be $S_2$ places after the position of $S_2,$ etc., so that $S_i$ acts as a "skip distance".

The non-uniformity in the digit-distribution is easily removed by simply adding $1$ to the present "skip distance"; i.e., use $N_2 = A_{1,n+1+(S_1+1)},\ N_3 = A_{1,(n+2+S_1)+(S_2+1)},$ etc.

Related Question