[Math] A binomial random number generating algorithm that works when $ n \times p $ is very small

binomial theorembinomial-coefficientsrandom

I need to generate binomial random numbers:

For example, consider binomial random numbers. A binomial random
number is the number of heads in N tosses of a coin with probability p
of a heads on any single toss. If you generate N uniform random
numbers on the interval $(0,1)$ and count the number less than $p$, then
the count is a binomial random number with parameters $N$ and $p$.

In my case, my N could range from $1\times10^{3}$ to $1\times10^{10}$. My p is around $1\times10^{-7}$.

Often my $n\times p$ is around $1 \times 10^{-3}$.

There is a trivial implementation to generate such binomial random number through loops:

public static int getBinomial(int n, double p) {
  int x = 0;
  for(int i = 0; i < n; i++) {
    if(Math.random() < p)
      x++;
  }
  return x;
}

This native implementation is very slow. I tried the Acceptance Rejection/Inversion method [1] implemented in the Colt (http://acs.lbl.gov/software/colt/) lib. It is very fast, but the distribution of its generated number only agrees with the native implementation when $n \times p$ is not very small. In my case when $n\times p = 1\times 10^{-3}$, the native implementation can still generate the number $1$ after many runs, but the Acceptance Rejection/Inversion method can never generate the number $1$. (always returns $0$)

Does anyone know what is the problem here? Or can you suggest a better binomial random number generating algorithm that can solve my case.

[1] V. Kachitvichyanukul, B.W. Schmeiser (1988): Binomial random variate generation, Communications of the ACM 31, 216-222.

Best Answer

The following method is theoretically exact, provided we have a "good" random generator urand()$\in[0,1)$. It uses the geometric distribution, i.e. the number of trials until a probability $p$ event occurs for the first time:

int getGeometric(double p) {
    return ceil( log( urand() ) / log(1-p) );
}


int getBinomial(int N, double p) {
    int count = 0;
    for (;;) {
        int wait = getGeometric(p);
        if (wait > N) return count;
        count++;
        N -= wait;
    }
}

However, if $p$ is really small, you may want to replace log(1-p) with (-p). Also getGeometric exhibits some rounding artefacts when urand()is very small.


The following is roughly based on suggestions in Knuth, Seminumerical Algorithms, 3.4.1.F(2). Many improvements especially for getBeta and getGamma are possible:

#define NOTSOMANY 50  
int getBinomial(int N, double p) {
   if (N < NOTSOMANY) {
      int count = 0;
      for (int i=0; i<N; i++) if (urand() < p) count++;
      return count;
   } 
   int a = 1+N/2;
   int b = N+1-a;
   double X = getBeta(a,b);
   if (X>p) 
      return getBinomial(a-1,p/X);
   else
      return a + getBinomial(b-1,(p-X)(1-X));
}


#define SOMELIMIT  3.0  
// must be > 1.0, should be >= 3.0 to make getGamma fast,
// but should also be fairly small to make loop short
double getBeta(double a, double b) {  // a>0, b>0
   if (a < SOMELIMIT && b < SOMELIMIT) {
      for (;;) {
         double Y1 = exp(ln(urand())/a), Y2 =  exp(ln(urand())/b);
         if (Y1+Y2 <1) return Y1/(Y1+Y2);
      }
   } else {
      double X1 = getGamma(a), X2 = getGamma(b);
      return X1/(X1+X2);
   } 
}


double getGamma(double a) {  // a > 1
   for (;;) {
      double Y = tan(Pi*urand());
      double X = sqrt(2*a-1)*Y+a-1;
      if ( X>0 && urand() 
           <= (1+Y*Y)*exp((a-1)*ln(X/(a-1))-sqrt(2*a-1)*Y) )
         return X;
   }
}

Yet another approach: Instead of $N$ trials with $p$, make $N/2$ trials with $1-(1-p)^2=(2-p)p$ to count how many pairs ($M$, say, with $M\approx Np$ if $p$ is small) of consecutive trials have at least one hit. Among these, the pairs with two hits are $(M,p/(2-p))$ binomially distributed

int getBinomial(int N, double p) {
   if (N < NOTSOMANY) {
      int count = 0;
      for (int i=0; i<N; i++) if (urand() < p) count++;
      return count;
   }
   if ( p > 0.5 ) 
      return N-getBinomial(N,1-p);
   else {
      int M = getBinomial(N/2,p*(2-p));
      int result = M + getBinomial(M, p/(2-p));
      if (N&1) 
         if (urand() < p) result++;
      return result;
   }
}

I didn't investigate that further, but it is definitely useful only if $Np$ is not big (as each step of the result ultimately comes from some count++)

Related Question