[Math] Simulating a Binomial distribution with $\mathscr{U}(0,1)$.

probabilityprobability distributionssimulation

Ex:Show the procedures to simulate an random variable that follows a binomial distribution with parameter $p$, using the $\mathscr{U}(0,1)$(Uniform distribution on the interval (0,1)).

I tried to solve this question by using the following theorem:

Theorem: Let $U\sim\mathscr{U}(0,1) $. Let $X$ be a random variable with distribution $F_X(x)$. Therefore the random variable Y=F^{-1}(U) has a distribution function equal to $F_X$, the distribution of the $X$ variable.

According to this theorem I would need to find a the inverse of the binomial c.d.f, define it as a function in python and generate random numbers.

However I have no idea on how to invert the Binomial distribution.

Questions:

1) Is this the simplest method to simulate a Binomial distribution with the Uniform(0,1)? Are there other methods?

2) How do I compute the inverse of the Binomial distribution?

Thanks in advance!

Best Answer

It may be helpful to see how this procedure for random sampling from $\mathsf{Binom}(n=5, p=.4)$ is implemented in R statistical software. First, some R notation: runif (without extra parameters) is a source of pseudorandom values from standard uniform; dbinom, pbinom and qbinom denote binomial PDF, CDF, and quantile funcions (invese CDF) respectively.

So in R we can generate $m = 100,000$ observations from $\mathsf{Binom}(n=5, p=.4)$ in a vector xas follows.

set.seed(4118)
m = 10^5;  u = runif(m)
x = qbinom(u, 5, .4)     # inverse CDF transformation

Then we can tally the results, make a histogram of them, and plot exact PDF values on the histogram for comparison:

table(x)/m
x
      0       1       2       3       4       5 
0.07790 0.25775 0.34608 0.23105 0.07744 0.00978 

hist(x, prob=T, br=(0:6)-.5, col="skyblue2", main="100,000 Realizations of BINOM(5,.5)")
k = 0:5; pdf = dbinom(k, 5, .4)
points(k, pdf, col="red") 

enter image description here

Within the accuracy of the graph, it is not possible to distinguish the simulated results (heights of histogram bars) from the theoretical ones (small red circles).

Finally, by using the same seed for the pseudoranom generator as above, we can access exactly the same values u as above. Thus, we can see that R implements this (inverse CDF) method to generate $m$ observations from $\mathsf{Binom}(n=5, p=.4)$ by using the function rbinom defined in R. The tallied results are exactly the same below as above.

set.seed(4118)
m = 10^5;  x = rbinom(m, 5, .4)
table(x)/m
x
      0       1       2       3       4       5 
0.07790 0.25775 0.34608 0.23105 0.07744 0.00978 

Note: By contrast, when $p >.5,$ R uses a slight modification of the inverse CDF method, so that the two approaches give slightly different results. (I used $m = 10,000$ so that differences would be more obvious.)

# Inverse CDF method 
set.seed(401);  m = 10^4;  u = runif(m);  x1 = qbinom(u, 5, .7)
table(x1)/m
x1
     0      1      2      3      4      5 
0.0022 0.0296 0.1323 0.3076 0.3651 0.1632 

# Built-in function 'rbinom'
set.seed(401);  x2 = rbinom(m, 5, .7)
table(x2)/m
x2
     0      1      2      3      4      5 
0.0023 0.0278 0.1288 0.3126 0.3599 0.1686 

Addendum 1: Graphs of CDF of $X \sim \mathsf{Binom}(5, .4)$ and its inverse function. The latter shows $F_X^{-1}(u) = 0,$ for $u < 0.6^5=0.07776,$ as in a Comment.

enter image description here

Addendum 2: Generating $X \sim \mathsf{Binom}(5, .4)$ as the sum of five independent Bernoulli random variables with $p=.4.$ [This is the method suggested in the Comment by @GNUSupporter.]

First let $U_1, U_2, \dots, U_5$ be a random sample from $\mathsf{Unif}(0,1).$ Then $B_i = 1,$ if $U_i \le .4$ and $0$ otherwise. This is essentially a trivial application of the quantile method to a variable that takes only values $0$ and $1$. Then $X = \sum_{i=1}^5 B_i \sim \mathsf{Binom}(5, .4).$ We generate four such binomial random variables below (results: 1, 2, 2, 4). Notice that five pseudorandom uniform values are required for each binomial.

set.seed(1234)
u = runif(5); b = (u < .4);  x = sum(b);  x # sum of logical vec b is nr of its TRUEs
[1] 1
u = runif(5); b = (u < .4);  x = sum(b);  x 
[1] 2
u = runif(5); b = (u < .4);  x = sum(b);  x 
[1] 2
u = runif(5); b = (u < .4);  x = sum(b);  x 
[1] 4

Next we use a for loop to iterate this procedure $m = 100,000$ times. Because we start with the same seed as above, the first four iterations repeat the realizations of $\mathsf{Binom}(5, .4)$ shown just above. A tally of all $m$ results shows results similar to those in the initial simulation of this Answer, closely matching the target distribution.

set.seed(1234);  m = 10^5;  x = numeric(m)
for (i in 1:m) {
  u = runif(5); b = (u < .4);  x[i] = sum(b)  }
mean(x);  x[1:4]
[1] 1.99806  # aprx E(X) = 5(.4) = 2
[1] 1 2 2 4  # same first four realizations of X as above
table(x)/m 
x
      0       1       2       3       4       5 
0.07766 0.25971 0.34683 0.22878 0.07675 0.01027 
Related Question