Ex:Show the procedures to simulate an random variable that follows a binomial distribution with parameter $p$, using the $\mathscr{U}(0,1)$(Uniform distribution on the interval (0,1)).
I tried to solve this question by using the following theorem:
Theorem: Let $U\sim\mathscr{U}(0,1) $. Let $X$ be a random variable with distribution $F_X(x)$. Therefore the random variable Y=F^{-1}(U) has a distribution function equal to $F_X$, the distribution of the $X$ variable.
According to this theorem I would need to find a the inverse of the binomial c.d.f, define it as a function in python and generate random numbers.
However I have no idea on how to invert the Binomial distribution.
Questions:
1) Is this the simplest method to simulate a Binomial distribution with the Uniform(0,1)? Are there other methods?
2) How do I compute the inverse of the Binomial distribution?
Thanks in advance!
Best Answer
It may be helpful to see how this procedure for random sampling from $\mathsf{Binom}(n=5, p=.4)$ is implemented in R statistical software. First, some R notation:
runif
(without extra parameters) is a source of pseudorandom values from standard uniform;dbinom
,pbinom
andqbinom
denote binomial PDF, CDF, and quantile funcions (invese CDF) respectively.So in R we can generate $m = 100,000$ observations from $\mathsf{Binom}(n=5, p=.4)$ in a vector
x
as follows.Then we can tally the results, make a histogram of them, and plot exact PDF values on the histogram for comparison:
Within the accuracy of the graph, it is not possible to distinguish the simulated results (heights of histogram bars) from the theoretical ones (small red circles).
Finally, by using the same seed for the pseudoranom generator as above, we can access exactly the same values
u
as above. Thus, we can see that R implements this (inverse CDF) method to generate $m$ observations from $\mathsf{Binom}(n=5, p=.4)$ by using the functionrbinom
defined in R. The tallied results are exactly the same below as above.Note: By contrast, when $p >.5,$ R uses a slight modification of the inverse CDF method, so that the two approaches give slightly different results. (I used $m = 10,000$ so that differences would be more obvious.)
Addendum 1: Graphs of CDF of $X \sim \mathsf{Binom}(5, .4)$ and its inverse function. The latter shows $F_X^{-1}(u) = 0,$ for $u < 0.6^5=0.07776,$ as in a Comment.
Addendum 2: Generating $X \sim \mathsf{Binom}(5, .4)$ as the sum of five independent Bernoulli random variables with $p=.4.$ [This is the method suggested in the Comment by @GNUSupporter.]
First let $U_1, U_2, \dots, U_5$ be a random sample from $\mathsf{Unif}(0,1).$ Then $B_i = 1,$ if $U_i \le .4$ and $0$ otherwise. This is essentially a trivial application of the quantile method to a variable that takes only values $0$ and $1$. Then $X = \sum_{i=1}^5 B_i \sim \mathsf{Binom}(5, .4).$ We generate four such binomial random variables below (results: 1, 2, 2, 4). Notice that five pseudorandom uniform values are required for each binomial.
Next we use a
for
loop to iterate this procedure $m = 100,000$ times. Because we start with the same seed as above, the first four iterations repeat the realizations of $\mathsf{Binom}(5, .4)$ shown just above. A tally of all $m$ results shows results similar to those in the initial simulation of this Answer, closely matching the target distribution.