[Math] Probability of n random numbers under a threshold

probabilityrandom

If we choose N random numbers between 0 and 1 (uniformly distributed), what is the probability that at least n will be under some threshold t?

For example I want to know if I have a list of 1,000,000 numbers between 0 and 1, what's the probability that at least 50 of them will be under 0.75?

Best Answer

The idea is to regard each choice of uniformly distributed random number $U_i \sim \operatorname{Uniform}(0,1)$ where $i = 1, 2, \ldots, N$, as associated with a Bernoulli trial $B_i \sim \operatorname{Bernoulli}(p)$, where $$B_i = \begin{cases}1, & 0 \le U_i < t \\ 0, & t \le U_i \le 1. \end{cases}$$ That is to say, $B_i = 1$ if and only if the random number $U_i$ is less than the threshold $t$.

Therefore the success parameter $p$ of the Bernoulli trial is $$\Pr[B_i = 1] = \Pr[U_i < t] = \frac{t}{1-0} = t, \quad 0 \le t \le 1.$$ The total of all these (independent) Bernoulli trials is $$T = B_1 + B_2 + \cdots + B_N \sim \operatorname{Binomial}(N, t).$$

Now in order to get a probability that at least $n$ out of $N$ of the $U_i$s are less than $t$, this is simply equivalent to the probability that $T \le n$: $$\Pr[T \ge n] = \sum_{k=n}^N \Pr[T = k] = \sum_{k=n}^N \binom{N}{k} t^k (1-t)^{N-k}.$$ This sum furnishes the exact probability of the desired event for a given $N$, $n$, and $t$.

However, if $\min(n,N-n)$ is very large, then this sum (or its complementary probability) will have many terms and be unwieldy to compute. In such a case, the normal approximation (with continuity correction) to the binomial distribution is desirable: $$\Pr[T \ge n] \approx \Pr\left[ Z \ge \frac{n - Nt - \frac{1}{2}}{\sqrt{Nt(1-t)}} \right] = 1 - \Phi \left( \frac{n - Nt - \frac{1}{2}}{\sqrt{Nt(1-t)}}\right),$$ where $Z \sim \operatorname{Normal}(0,1)$ is a standard normal random variable, and $\Phi(z) = \Pr[Z \le z]$ is the cumulative distribution function of $Z$ whose values can be looked up in a standard normal table.

For your example of $N = 10^6$, $n = 50$, $t = 0.75$, the probability of such an event is about $1 - 4.00871338379431228438 \times 10^{-601806}$. The normal approximation gives $1 - 4.23635959294153500 \times 10^{-651360},$ which is also very close to $1$.