[Math] Number of samples to predict the next number in a pseudorandom number generator

number theoryrandomsequences-and-series

Let:

$$R_{n+1} = (mR_n + b) \bmod{a} $$

Assume we know the values of $R_1, R_2, \ldots, R_L $. What is the minimum value of $L$ (if it exists) such that we can determine $R_0, m, b$ and $a$?

Best Answer

Strictly speaking in the worst case if $R_0=0,m=b=1$ then we cannot determine $a$ until we complete a cycle.

Practically speaking, though, $L=3$ is always too small but $L=4$ may be enough. For $L=3$ we have $$ \begin{align} \left[\begin{matrix} R_1 & 1 \\ R_2 & 1 \end{matrix}\right] \left[\begin{matrix}m\\ b\end{matrix}\right] & \equiv \left[\begin{matrix} R_2 \\ R_3\end{matrix}\right] \\ \left[\begin{matrix}m\\ b\end{matrix}\right] & \equiv \left[\begin{matrix} R_1 & 1 \\ R_2 & 1 \end{matrix}\right]^{-1}\left[\begin{matrix} R_2 \\ R_3\end{matrix}\right] \end{align} $$ and the matrix inverse can be computed modulo most $a$.

Here's essentially the same question on security.SE, with a solution giving this algorithm. Define $t_n=R_{n+1}-R_n$, then $$ \begin{align} t_n &\equiv (m-1)R_n+b \\ t_{n+1} &\equiv (m-1)mR_n+mb \\ t_{n+2} &\equiv (m-1)m^2R_n+m^2b\\ t_nt_{n+2}-t_{n+1}^2 &\equiv 0 \pmod{a} \end{align} $$ Define $u_n=|t_nt_{n+2}-t_{n+1}^2|$ then $a=\gcd(u_1,u_2,\ldots,u_k)$ with probability increasing in $k$.

For example, if we are given $$ 4722,6543,8671,1692,2457,7536 $$ then $u_1=7\cdot 11\cdot 23\cdot 9733$. We can compute $u_2,u_3$ and find $\gcd(u_1,u_2,u_3)=9733$ which is prime and must be $a$.

Or in this case we can determine the unique solution without using $R_6$ by computing $m,b$ for each factor of $u_1$ that is larger than our largest sample. This gives eight possibilities, e.g. $(a,m,b)=(9733,541,1987)$ or $(749441,175735,566501)$, but only the first also matches $R_4$ and $R_5$.

Related Question