Solved – Generating sorted pseudo-random numbers in Stata

random-generationstata

Today I opened two STATA windows and ran the following command in both:

set obs 100
gen x = rnormal()
sort x

(the difference is that on the second window I generated a variable called y). Summing up: I asked STATA to give me 100 pseudo-random numbers taken from a standard normal distribution, then I sorted it. To my surprise, the numbers of the x and y vectors are the same! I did this at home, and then at work, and my impression is that all of these vectors are the same. Is there an explanation for this, to me, strange behavior?

If this is a problem in STATA, does R have a better pseudo-random number generator procedure?

A side question. I came up to this "problem" because I was trying to generate two pseudo-random columns in Stata (x and y, say), and then sort then separately. But the two commands I know for sorting (sort and gsort) sort the whole database, not separate columns. Would you know of a Stata command that allows me to sort a column while keeping the other columns fixed?

Best Answer

The help for set_seed states

The sequences these functions produce are determined by the seed, which is just a number and which is set to 123456789 every time Stata is launched.

Stata's philosophy emphasizes reproducibility, so this consistency is not surprising. Of course you can set the seed yourself. See the help page for more information.

One way to sort a column separately from all others is to preserve your data, keep only the column to sort, sort it, save the results in a temporary file, restore your data, and merge the temporary file:

gen y = rnormal()
preserve
keep y
sort y
tempfile out
save `out'
restore
merge 1:1 _n using `out', nogen