First will come the pretty complete theory. Then we look at your particular situation.
Theory: We need an expression for the variance of $X$. The variance is $E(X^2)-(E(X))^2$. For $E(X^2)$, we need to calculate
$$\int_a^b \frac{x^2}{b-a}\,dx,$$
which is $\frac{b^3-a^3}{3(b-a)}$. This simplifies to $\frac{b^2+ab+a^2}{3}$.
I imagine that you know that $E(X)=\frac{b+a}{2}$. One can do this by integration, but it is clear by symmetry that the mean is halfway between $a$ and $b$.
So we know that the variance is $\frac{b^2+ab+a^2}{3}-\frac{(b+a)^2}{4}$. Bring to a common denominator, simplify. We get that
$$\text{Var}(X)=\frac{(b-a)^2}{12} \tag{$\ast$}.$$
More simply, you can search under uniform distribution, say on Wikipedia. They will have the expression $(\ast)$ for the variance of $X$.
Your problem: We know that $\frac{b+a}{2}=3.5$. We also know that the standard deviation of $X$ is $1.3$, so the variance is $(1.3)^2=1.69$.
So, by $(\ast)$, $\frac{(b-a)^2}{12}=1.69$, and therefore $b-a=\sqrt{(12)(1.69)}\approx 4.5033$. We also know that $b+a=(2)(3.5)=7$. Now that we know $b-a$ and $b+a$, it is easy to find $a$ and $b$.
For your simulation, presumably you are starting from a random number generator that generates numbers that are more or less uniformly distributed on $[0,1)$. If $U$ represents the output of such a generator, we simulate $X$ by using $a+(b-a)U$. And we now know $a$ and $b$.
Added If you want a general formula instead of a procedure, let $\mu=\frac{a+b}{2}$ be the mean, and $\sigma=\frac{b-a}{\sqrt{12}}=\frac{b-a}{2\sqrt{3}}$ be the standard deviation. Then $\frac{b-a}{2}=\sqrt{3}\,\sigma$.
We get $a=\frac{b+a}{2}-\frac{b-a}{2}=\mu-\sqrt{3}\,\sigma$. It follows that we can take
$$X=\mu-\sqrt{3}\,\sigma + (2\sqrt{3}\,\sigma) U=\mu+(\sqrt{3}\,\sigma)(2U-1).$$
The following method is theoretically exact, provided we have a "good" random generator urand()
$\in[0,1)$. It uses the geometric distribution, i.e. the number of trials until a probability $p$ event occurs for the first time:
int getGeometric(double p) {
return ceil( log( urand() ) / log(1-p) );
}
int getBinomial(int N, double p) {
int count = 0;
for (;;) {
int wait = getGeometric(p);
if (wait > N) return count;
count++;
N -= wait;
}
}
However, if $p$ is really small, you may want to replace log(1-p)
with (-p)
. Also getGeometric
exhibits some rounding artefacts when urand()
is very small.
The following is roughly based on suggestions in Knuth, Seminumerical Algorithms, 3.4.1.F(2). Many improvements especially for getBeta
and getGamma
are possible:
#define NOTSOMANY 50
int getBinomial(int N, double p) {
if (N < NOTSOMANY) {
int count = 0;
for (int i=0; i<N; i++) if (urand() < p) count++;
return count;
}
int a = 1+N/2;
int b = N+1-a;
double X = getBeta(a,b);
if (X>p)
return getBinomial(a-1,p/X);
else
return a + getBinomial(b-1,(p-X)(1-X));
}
#define SOMELIMIT 3.0
// must be > 1.0, should be >= 3.0 to make getGamma fast,
// but should also be fairly small to make loop short
double getBeta(double a, double b) { // a>0, b>0
if (a < SOMELIMIT && b < SOMELIMIT) {
for (;;) {
double Y1 = exp(ln(urand())/a), Y2 = exp(ln(urand())/b);
if (Y1+Y2 <1) return Y1/(Y1+Y2);
}
} else {
double X1 = getGamma(a), X2 = getGamma(b);
return X1/(X1+X2);
}
}
double getGamma(double a) { // a > 1
for (;;) {
double Y = tan(Pi*urand());
double X = sqrt(2*a-1)*Y+a-1;
if ( X>0 && urand()
<= (1+Y*Y)*exp((a-1)*ln(X/(a-1))-sqrt(2*a-1)*Y) )
return X;
}
}
Yet another approach: Instead of $N$ trials with $p$, make $N/2$ trials with $1-(1-p)^2=(2-p)p$ to count how many pairs ($M$, say, with $M\approx Np$ if $p$ is small) of consecutive trials have at least one hit. Among these, the pairs with two hits are $(M,p/(2-p))$ binomially distributed
int getBinomial(int N, double p) {
if (N < NOTSOMANY) {
int count = 0;
for (int i=0; i<N; i++) if (urand() < p) count++;
return count;
}
if ( p > 0.5 )
return N-getBinomial(N,1-p);
else {
int M = getBinomial(N/2,p*(2-p));
int result = M + getBinomial(M, p/(2-p));
if (N&1)
if (urand() < p) result++;
return result;
}
}
I didn't investigate that further, but it is definitely useful only if $Np$ is not big (as each step of the result ultimately comes from some count++
)
Best Answer
I'm assuming that $E$ is the expected value of the full geometric distribution, not of the truncated one. You've used $E$ for the expected value of both the geometric distribution and the exponential distribution; since they need to be slightly different, I'll use $E'$ for the expected value of the geometric distribution.
You can truncate the distribution by restricting $U$ to a range such that $T\in[0,n)$; that is, $U$ should be uniformly distributed in $(\mathrm e^{-n/E},1]$.
It does if you choose the right parameter. The area under each interval of length $1$ of the continuous distribution is larger by a factor of $\mathrm e^{-1/E}$ than the one to the right, so the expected value of the geometric distribution you get will be $E'=\mathrm e^{-1/E}/(1-\mathrm e^{-1/E})=1/(\mathrm e^{1/E}-1)$. Thus, given $E'$, you can determine $E$ as $E=1/\log(1+1/E')$.
As you wrote, this is quite unlikely to be a problem in your lifetime or anyone else's.
Yes. An exponential distribution can only have expected value $0$ if it's degenerate, a delta distribution at the origin.