[Math] How to sample a binomial random variable

binomial distributionprobability

Rather than using mathematical libraries, how would you sample from a binomial random variable efficiently?

Given the binomial random variable X, where $k$ are the number of successes in $n$ trials with a success probability of $p$
$$
P(X=k|n, p) = \binom{n}{k} p^k(1-p)^{n-k},
$$
how could I obtain $N$ number of samples of $X$?

The naïve approach is to decompose $X$ into $X = Y_1 + Y_2 + … + Y_n$, where $Y$ are the bernoulli experiments
$$
Y_i = P(Y_i = 1) = p.
$$

Or, in other words, I can test $n$ times if some random value is above $p$ and count how many times this was a success. This is, however, terribly inefficient when $n$ is large.

It is possible to sample a continuous random variable by finding the inverse CDF ($F^{-1}(x)$), sampling from the uniform distribution $ u = U(0,1)$ and calculating the value of the sample in the inverse CDF $F^{-1}(u)$. Given that this is a discrete distribution, how could I apply this? Which other methods are available?

Best Answer

It makes no difference if the distribution from which you wish to sample is continuous or discrete. For example, suppose you wish to sample from $$X \sim \operatorname{Binomial}(n = 5, p = 0.7).$$ Then $$\begin{align*} \Pr[X < 0] &= 0 \\ \Pr[X \le 0] &= 0.00243 \\ \Pr[X \le 1] &= 0.03078 \\ \Pr[X \le 2] &= 0.16308 \\ \Pr[X \le 3] &= 0.47178 \\ \Pr[X \le 4] &= 0.83193 \\ \Pr[X \le 5] &= 1. \end{align*}$$ Then you draw realizations $u_i$ from a continuous uniform distribution on $[0,1]$, and you will select $X_i = x_i$ if $$\Pr[X < x_i] < u_i \le \Pr[X \le x_i].$$ This is the inverse CDF method. You'd create a lookup table for a given $n$ and $p$, and locate the index of the interval in which your random number lies between.

I can't speak to which method is faster, but I can say for certain that the inverse CDF method for large $n$ requires a lot more memory to create the lookup table, and is probably not as efficient. By the time your $n$ is so large that creating individual realizations is not computationally feasible, you are likely to be better off using a normal approximation to the binomial (unless $p$ is extremely close to $0$ or $1$).

Related Question