I need to generate binomial random numbers:
For example, consider binomial random numbers. A binomial random
number is the number of heads in N tosses of a coin with probability p
of a heads on any single toss. If you generate N uniform random
numbers on the interval $(0,1)$ and count the number less than $p$, then
the count is a binomial random number with parameters $N$ and $p$.
In my case, my N could range from $1\times10^{3}$ to $1\times10^{10}$. My p is around $1\times10^{-7}$.
Often my $n\times p$ is around $1 \times 10^{-3}$.
There is a trivial implementation to generate such binomial random number through loops:
public static int getBinomial(int n, double p) {
int x = 0;
for(int i = 0; i < n; i++) {
if(Math.random() < p)
x++;
}
return x;
}
This native implementation is very slow. I tried the Acceptance Rejection/Inversion method [1] implemented in the Colt (http://acs.lbl.gov/software/colt/) lib. It is very fast, but the distribution of its generated number only agrees with the native implementation when $n \times p$ is not very small. In my case when $n\times p = 1\times 10^{-3}$, the native implementation can still generate the number $1$ after many runs, but the Acceptance Rejection/Inversion method can never generate the number $1$. (always returns $0$)
Does anyone know what is the problem here? Or can you suggest a better binomial random number generating algorithm that can solve my case.
[1] V. Kachitvichyanukul, B.W. Schmeiser (1988): Binomial random variate generation, Communications of the ACM 31, 216-222.
Best Answer
The following method is theoretically exact, provided we have a "good" random generator
urand()
$\in[0,1)$. It uses the geometric distribution, i.e. the number of trials until a probability $p$ event occurs for the first time:However, if $p$ is really small, you may want to replace
log(1-p)
with(-p)
. AlsogetGeometric
exhibits some rounding artefacts whenurand()
is very small.The following is roughly based on suggestions in Knuth, Seminumerical Algorithms, 3.4.1.F(2). Many improvements especially for
getBeta
andgetGamma
are possible:Yet another approach: Instead of $N$ trials with $p$, make $N/2$ trials with $1-(1-p)^2=(2-p)p$ to count how many pairs ($M$, say, with $M\approx Np$ if $p$ is small) of consecutive trials have at least one hit. Among these, the pairs with two hits are $(M,p/(2-p))$ binomially distributed
I didn't investigate that further, but it is definitely useful only if $Np$ is not big (as each step of the result ultimately comes from some
count++
)