Let
- $x$ be the top end of your range, $x=100$ in your case.
- $n$ be the total number of draws, $n=25$ in your case.
For any number $y\le x$, the number of sequences of $n$ numbers with each number in the sequence $\le y$ is $y^n$. Of these sequence, the number containing no $y$s is $(y-1)^n$, and the number containing one $y$ is $n(y-1)^{n-1}$. Hence the number of sequences with two or more $y$s is
$$y^n - (y-1)^n - n(y-1)^{n-1}$$
The total number of sequences of $n$ numbers with highest number $y$ containing at least two $y$s is
\begin{align}
\sum_{y=1}^x \left(y^n - (y-1)^n - n(y-1)^{n-1}\right)
&= \sum_{y=1}^x y^n - \sum_{y=1}^x(y-1)^n - \sum_{y=1}^xn(y-1)^{n-1}\\
&= x^n - n\sum_{y=1}^x(y-1)^{n-1}\\
&= x^n - n\sum_{y=1}^{x-1}y^{n-1}\\
\end{align}
The total number of sequences is simply $x^n$. All sequences are equally likely and so the probability is
$$ \frac{x^n - n\sum_{y=1}^{y=x-1}y^{n-1}}{x^n}$$
With $x=100,n=25$ I make the probability 0.120004212454.
I've tested this using the following Python program, which counts the sequences that match manually (for low $x,n$), simulates and calculates using the above formula.
import itertools
import numpy.random as np
def countinlist(x, n):
count = 0
total = 0
for perm in itertools.product(range(1, x+1), repeat=n):
total += 1
if perm.count(max(perm)) > 1:
count += 1
print "Counting: x", x, "n", n, "total", total, "count", count
def simulate(x,n,N):
count = 0
for i in range(N):
perm = np.randint(x, size=n)
m = max(perm)
if sum(perm==m) > 1:
count += 1
print "Simulation: x", x, "n", n, "total", N, "count", count, "prob", count/float(N)
x=100
n=25
N = 1000000 # number of trials in simulation
#countinlist(x,n) # only call this for reasonably small x and n!!!!
simulate(x,n,N)
formula = x**n - n*sum([i**(n-1) for i in range(x)])
print "Formula count", formula, "out of", x**n, "probability", float(formula) / x**n
This program outputted
Simulation: x 100 n 25 total 1000000 count 120071 prob 0.120071
Formula count 12000421245360277498241319178764675560017783666750 out of 100000000000000000000000000000000000000000000000000 probability 0.120004212454
For what it's worth, this is based on experience and not on mathematical analysis:
I think that unless you're doing cryptography, where subtle patterns can be very bad, which seed you set doesn't make a difference, as long as you use accepted good PRNGs like Mersenne Twister and not old ones like linear congruential generators. As far as I know, there is no way that you can tell what random number will come out from a given seed without actually running the PRNG (assuming it's a decent one), otherwise you would just take that new algorithm and use that as your random number generator.
Another perspective: do you think that any subtle patterns in your Monte-Carlo simulation are likely to be of a larger magnitude than all the measurement error, confounding, and error introduces by other modeling assumptions?
I would just use one random seed at the beginning for reproducibility, and not set one before each call, unless I'm doing debugging, where I need to make sure two different algorithms produce the same result for the exact same input data.
Disclaimer: if you simulating nuclear reactors or missile control systems or weather forecasting, best to consult domain experts, I take no responsibility in that case.
Best Answer
The help for
set_seed
statesStata's philosophy emphasizes reproducibility, so this consistency is not surprising. Of course you can set the seed yourself. See the help page for more information.
One way to sort a column separately from all others is to preserve your data, keep only the column to sort, sort it, save the results in a temporary file, restore your data, and merge the temporary file: