Solved – What’s the probability that from 25 random numbers between 1 and 100, the highest appears more than once

probabilityrandom-generation

In many online games, when players complete a difficult task, sometimes a special reward is given which everyone who completed the task can use. this is usually a mount (method of transportation) or another vanity item (items which don't improve the performance of the character and are mainly used for appearance customization).

When such a reward is given, the most common way of determining who gets the reward is through random numbers. The game usually has a special command which generates a random (likely pseudorandom, not crypto secure random) number between 1 and 100 (sometimes the player can choose another spread, but 100 is the most common). Each player uses this command, all the players can see who rolled what, and the item is awarded to the person who rolls highest. Most games even have a a built-in system where players just press a button and once everyone pressed their button, the game does the rest automatically.

Sometimes, some players generate the same high number and noone beats them. this is usually resolved by those players regenerating their numbers, until there is a unique highest number.

My question is the following: Assume a random number generator which can generate any number between 1 and 100 with the same probability. Assume that you have a group of 25 players who each generate 1 number with such a random number generator (each with their own seed). You'll have 25 numbers between 1 and 100, with no limitations on how many players roll a specific numbder and no relation between the numbers. What is the chance that the highest generated number is generated by more than 1 player? In other words, what is the likelihood of a tie?

Best Answer

Let

  • $x$ be the top end of your range, $x=100$ in your case.
  • $n$ be the total number of draws, $n=25$ in your case.

For any number $y\le x$, the number of sequences of $n$ numbers with each number in the sequence $\le y$ is $y^n$. Of these sequence, the number containing no $y$s is $(y-1)^n$, and the number containing one $y$ is $n(y-1)^{n-1}$. Hence the number of sequences with two or more $y$s is $$y^n - (y-1)^n - n(y-1)^{n-1}$$ The total number of sequences of $n$ numbers with highest number $y$ containing at least two $y$s is \begin{align} \sum_{y=1}^x \left(y^n - (y-1)^n - n(y-1)^{n-1}\right) &= \sum_{y=1}^x y^n - \sum_{y=1}^x(y-1)^n - \sum_{y=1}^xn(y-1)^{n-1}\\ &= x^n - n\sum_{y=1}^x(y-1)^{n-1}\\ &= x^n - n\sum_{y=1}^{x-1}y^{n-1}\\ \end{align}

The total number of sequences is simply $x^n$. All sequences are equally likely and so the probability is $$ \frac{x^n - n\sum_{y=1}^{y=x-1}y^{n-1}}{x^n}$$

With $x=100,n=25$ I make the probability 0.120004212454.

I've tested this using the following Python program, which counts the sequences that match manually (for low $x,n$), simulates and calculates using the above formula.

import itertools
import numpy.random as np

def countinlist(x, n):
    count = 0
    total = 0
    for perm in itertools.product(range(1, x+1), repeat=n):
        total += 1
        if perm.count(max(perm)) > 1:
            count += 1

    print "Counting: x", x, "n", n, "total", total, "count", count

def simulate(x,n,N):
    count = 0
    for i in range(N):
        perm = np.randint(x, size=n)
        m = max(perm)
        if sum(perm==m) > 1:
            count += 1
    print "Simulation: x", x, "n", n, "total", N, "count", count, "prob", count/float(N)

x=100
n=25
N = 1000000 # number of trials in simulation

#countinlist(x,n) # only call this for reasonably small x and n!!!!
simulate(x,n,N)
formula = x**n - n*sum([i**(n-1) for i in range(x)])
print "Formula count", formula, "out of", x**n, "probability", float(formula) / x**n

This program outputted

Simulation: x 100 n 25 total 1000000 count 120071 prob 0.120071
Formula count 12000421245360277498241319178764675560017783666750 out of 100000000000000000000000000000000000000000000000000 probability 0.120004212454