[Math] Probability of an RNG

probabilityrandom

For fun (relatively speaking), I'd like to do a little research on probabilities related to a pseudo-random number generator.

Specifically, how would I get started figuring out "what is the probability that a prng will generate the same random number twice?"

This is part of an investigation into a larger issue, but I'm not sure where to begin (or if my question is even relevant).

Edited to add:

After reading the comments and answers I see that my question isn't clear. As an example: the rng outputs a 5 What is the probability that another 5 will be the next output? How many "rolls" of the RNG must I make before I see another 5?

If we assume a good rng has uniform distribution, then is it 1 out of (2^32) (assuming 32bit integers)?

For a larger example, lets say I use the RNG to generate numbers that are used as an index into an array of characters. The array is 25 numbers long, so the output of the RNG is mod 25 I will generate a sequence of 5 characters.

How would I determine the probability of generating the same 5 character sequence?

If we assume that we have a uniform distribution of output numbers from the RNG, then the RNG's output shouldn't factor into the probability of generating the same character sequence.

However, is my assumption valid?

Best Answer

All pseudorandom number generators are periodic; you can thus expect any number you've seen to appear again if you do Monte Carlo long enough. What you really want to ask, I think, is how long their periods are. To list the ones I'm most familiar with, linear congruential generators (often used in calculators) typically have (very!) short periods (though two or more of these can be combined to produce a generator with a slightly longer period), the Marsaglia-Zaman generator has a period of $\approx 5\times 10^{171}$, and the now-popular Mersenne twister by Matsumoto and Nishimura has a period of $2^{19937} - 1\approx4\times 10^{6001}$. One necessary condition for a quality PRNG is a long period length (but this isn't a sufficient condition, as it should also pass a bunch of stringent statistical tests). Have a look at these two articles for more information.

Related Question