Solved – References and Best practices for setting seeds in pseudo-Random Number Generation

random-generation

In this document, that concerns the "set seed" command, Stata people discuss issues related to the setting of seeds when generating pseudo-random numbers.

A notable "don't" is "don't use serially the sequence of natural numbers as seeds, because this has a pattern and endangers pseudo-randomness".

A only one-quarter-jokingly notable "do", is to set just one seed during your lifetime, and then record the "state" of the generated process at the end of each experiment, so that the next experiment will continue at the point where the process has stopped.

Obviously, the above advice depends on the expected count of pseudo-random numbers one will generate in his research life-time. Perhaps a Mersenne twister would cover the life-time needs of many researchers…

Now, I am not greatly experienced as regards PRNGs in theory or in practice, so I cannot argue about these suggestions -they should be proven valid or invalid on theoretical grounds and hard mathematical statistics.

So, my questions are

1) Can you help explaining or invalidating the advices given above, or point to a reference that deals with such issues?

2) Can you provide references that offer "best practices" in setting seeds?

3) How do you go about it in your own work, and why?

As an example for question 3), suppose that for a Monte Carlo study, you want to generate $m$ samples each of size $n$, and that your $\text{PRNG}$ has a period sufficiently larger than $mn$. Would you generate all $mn$ pseudo-random numbers with one seed, or you have the habit of changing seeds, say, per sample? (but that's just for illustration -I believe more general answers are worthwhile here).

A related thread (although much more focused) is
Set seed before each code block or once per project?

I have the feeling this perhaps should be a community wiki, the mods please decide on that.

Best Answer

For what it's worth, this is based on experience and not on mathematical analysis:

I think that unless you're doing cryptography, where subtle patterns can be very bad, which seed you set doesn't make a difference, as long as you use accepted good PRNGs like Mersenne Twister and not old ones like linear congruential generators. As far as I know, there is no way that you can tell what random number will come out from a given seed without actually running the PRNG (assuming it's a decent one), otherwise you would just take that new algorithm and use that as your random number generator.

Another perspective: do you think that any subtle patterns in your Monte-Carlo simulation are likely to be of a larger magnitude than all the measurement error, confounding, and error introduces by other modeling assumptions?

I would just use one random seed at the beginning for reproducibility, and not set one before each call, unless I'm doing debugging, where I need to make sure two different algorithms produce the same result for the exact same input data.

Disclaimer: if you simulating nuclear reactors or missile control systems or weather forecasting, best to consult domain experts, I take no responsibility in that case.

Related Question