[Math] Generating geometrically-distributed variates over an interval given endpoints and expected value

algorithmsrandom

I want to generate random variates from a truncated geometric distribution over the interval $[0, n)$ with specified expected value, $0 \le E < n$. The obvious way to do this seems to be to sample the exponential distribution over the same interval and round down to the nearest integer, but I've run into some technical difficulties, in (what I believe to be) decreasing order of seriousness:

I don't know how to truncate the distribution except by rejection sampling, but I cannot do that, because the surrounding code requires a constant-time operation.
I'm not confident that rounding an exponential variate down to the nearest integer gives the proper distribution of integers.
The obvious way to generate an exponential variate over $[0, n)$ in constant time is inverse transform sampling: $T = -E \log U$ where $E$ is the expected value and $U$ is a uniform variate over $(0, 1]$. However, on a real computer (using IEEE doubles for all calculations), this will not generate variates larger than $744.44E$, because $U$ cannot be smaller than approximately $5\times 10^{-324}$. In my application, the desired upper limit $n$ might be quite large. I'm not sure this is a problem, since variates larger than $744.44E$ should appear less than once in $2 \times 10^{323}$ trials, but I can't convince myself it's genuinely not a problem.
When $E=0$, the above equation always evaluates to zero. Is that the correct behavior?

Best Answer

I'm assuming that $E$ is the expected value of the full geometric distribution, not of the truncated one. You've used $E$ for the expected value of both the geometric distribution and the exponential distribution; since they need to be slightly different, I'll use $E'$ for the expected value of the geometric distribution.

You can truncate the distribution by restricting $U$ to a range such that $T\in[0,n)$; that is, $U$ should be uniformly distributed in $(\mathrm e^{-n/E},1]$.
It does if you choose the right parameter. The area under each interval of length $1$ of the continuous distribution is larger by a factor of $\mathrm e^{-1/E}$ than the one to the right, so the expected value of the geometric distribution you get will be $E'=\mathrm e^{-1/E}/(1-\mathrm e^{-1/E})=1/(\mathrm e^{1/E}-1)$. Thus, given $E'$, you can determine $E$ as $E=1/\log(1+1/E')$.
As you wrote, this is quite unlikely to be a problem in your lifetime or anyone else's.
Yes. An exponential distribution can only have expected value $0$ if it's degenerate, a delta distribution at the origin.

Related Solutions

[Math] Generating uniformly distributed random variables with given mean and deviation

First will come the pretty complete theory. Then we look at your particular situation.

Theory: We need an expression for the variance of $X$. The variance is $E(X^2)-(E(X))^2$. For $E(X^2)$, we need to calculate $$\int_a^b \frac{x^2}{b-a}\,dx,$$ which is $\frac{b^3-a^3}{3(b-a)}$. This simplifies to $\frac{b^2+ab+a^2}{3}$.

I imagine that you know that $E(X)=\frac{b+a}{2}$. One can do this by integration, but it is clear by symmetry that the mean is halfway between $a$ and $b$.

So we know that the variance is $\frac{b^2+ab+a^2}{3}-\frac{(b+a)^2}{4}$. Bring to a common denominator, simplify. We get that $$\text{Var}(X)=\frac{(b-a)^2}{12} \tag{$\ast$}.$$ More simply, you can search under uniform distribution, say on Wikipedia. They will have the expression $(\ast)$ for the variance of $X$.

Your problem: We know that $\frac{b+a}{2}=3.5$. We also know that the standard deviation of $X$ is $1.3$, so the variance is $(1.3)^2=1.69$.

So, by $(\ast)$, $\frac{(b-a)^2}{12}=1.69$, and therefore $b-a=\sqrt{(12)(1.69)}\approx 4.5033$. We also know that $b+a=(2)(3.5)=7$. Now that we know $b-a$ and $b+a$, it is easy to find $a$ and $b$.

For your simulation, presumably you are starting from a random number generator that generates numbers that are more or less uniformly distributed on $[0,1)$. If $U$ represents the output of such a generator, we simulate $X$ by using $a+(b-a)U$. And we now know $a$ and $b$.

Added If you want a general formula instead of a procedure, let $\mu=\frac{a+b}{2}$ be the mean, and $\sigma=\frac{b-a}{\sqrt{12}}=\frac{b-a}{2\sqrt{3}}$ be the standard deviation. Then $\frac{b-a}{2}=\sqrt{3}\,\sigma$. We get $a=\frac{b+a}{2}-\frac{b-a}{2}=\mu-\sqrt{3}\,\sigma$. It follows that we can take $$X=\mu-\sqrt{3}\,\sigma + (2\sqrt{3}\,\sigma) U=\mu+(\sqrt{3}\,\sigma)(2U-1).$$

[Math] A binomial random number generating algorithm that works when $ n \times p $ is very small

The following method is theoretically exact, provided we have a "good" random generator urand()$\in[0,1)$. It uses the geometric distribution, i.e. the number of trials until a probability $p$ event occurs for the first time:

int getGeometric(double p) {
    return ceil( log( urand() ) / log(1-p) );
}


int getBinomial(int N, double p) {
    int count = 0;
    for (;;) {
        int wait = getGeometric(p);
        if (wait > N) return count;
        count++;
        N -= wait;
    }
}

However, if $p$ is really small, you may want to replace log(1-p) with (-p). Also getGeometric exhibits some rounding artefacts when urand()is very small.

The following is roughly based on suggestions in Knuth, Seminumerical Algorithms, 3.4.1.F(2). Many improvements especially for getBeta and getGamma are possible:

#define NOTSOMANY 50  
int getBinomial(int N, double p) {
   if (N < NOTSOMANY) {
      int count = 0;
      for (int i=0; i<N; i++) if (urand() < p) count++;
      return count;
   } 
   int a = 1+N/2;
   int b = N+1-a;
   double X = getBeta(a,b);
   if (X>p) 
      return getBinomial(a-1,p/X);
   else
      return a + getBinomial(b-1,(p-X)(1-X));
}


#define SOMELIMIT  3.0  
// must be > 1.0, should be >= 3.0 to make getGamma fast,
// but should also be fairly small to make loop short
double getBeta(double a, double b) {  // a>0, b>0
   if (a < SOMELIMIT && b < SOMELIMIT) {
      for (;;) {
         double Y1 = exp(ln(urand())/a), Y2 =  exp(ln(urand())/b);
         if (Y1+Y2 <1) return Y1/(Y1+Y2);
      }
   } else {
      double X1 = getGamma(a), X2 = getGamma(b);
      return X1/(X1+X2);
   } 
}


double getGamma(double a) {  // a > 1
   for (;;) {
      double Y = tan(Pi*urand());
      double X = sqrt(2*a-1)*Y+a-1;
      if ( X>0 && urand() 
           <= (1+Y*Y)*exp((a-1)*ln(X/(a-1))-sqrt(2*a-1)*Y) )
         return X;
   }
}

Yet another approach: Instead of $N$ trials with $p$, make $N/2$ trials with $1-(1-p)^2=(2-p)p$ to count how many pairs ($M$, say, with $M\approx Np$ if $p$ is small) of consecutive trials have at least one hit. Among these, the pairs with two hits are $(M,p/(2-p))$ binomially distributed

int getBinomial(int N, double p) {
   if (N < NOTSOMANY) {
      int count = 0;
      for (int i=0; i<N; i++) if (urand() < p) count++;
      return count;
   }
   if ( p > 0.5 ) 
      return N-getBinomial(N,1-p);
   else {
      int M = getBinomial(N/2,p*(2-p));
      int result = M + getBinomial(M, p/(2-p));
      if (N&1) 
         if (urand() < p) result++;
      return result;
   }
}

I didn't investigate that further, but it is definitely useful only if $Np$ is not big (as each step of the result ultimately comes from some count++)

Best Answer

Related Solutions

[Math] Generating uniformly distributed random variables with given mean and deviation

[Math] A binomial random number generating algorithm that works when $ n \times p $ is very small

Related Question