Solved – Confidence Interval on the Geometric Distribution Expected Value

confidence intervalgeometric-distribution

If we are told that some random variable $X$ follows a Geometric distribution, with $Pr(X =1) = p$. The sample has observed values between $1$ and $N$.

We know that $E(X) = 1/p$

My question is: Can we construct a confidence interval for the mean?

Best Answer

If one knows the population parameter of a geometric, one of course knows the population mean exactly, so a confidence interval for that would be of zero width.

Assuming we only have sample information, we can construct a confidence interval for the population mean of a geometric random variable.

Since your lowest value is 1, I assume we're dealing with the "number of trials" form of the geometric.

Large sample: The population mean of a Geometric(p) variate is $1/p$; the variance is $(1-p)/p^2$, so the variance of a sample mean will be $(1-p)/(np^2)$. The statistic $Q_A=\frac{\bar{X}-1/p}{\sqrt{(1-p)/(np^2)}}\:$ (*) will be asymptotically standard normal. A large sample interval for $p$ could be "backed out" from that. Which is to say, we can make an interval for $Q_A$, and then find the values of $p$ which make $Q_A$ satisfy that condition (of being in the interval).

e.g. if $\bar{x}=3.14$ and $n=100$ I get an asymptotic 95% interval for $p$ to be $(0.265, 0.368)$ (just by seeing which values for $p$ make that expression for $Q_A$ above stay between -1.96 and 1.96). Hence an interval for $1/p$ (the population mean) would be $(1/0.368,1/0.265)$, or $(2.72,3.77)$. Note that this is not an interval that's symmetric about the usual point estimate.

A more sophisticated approach (in the sense of letting you more directly get the bounds) would attempt to solve the expression for $p^{\:(\dagger)}\:$ (using $Z_\frac{\alpha}{2}$ in place of 1.96), I think this just gives a quadratic in $p$, so it's probably not onerous, but if you only need to do it once, hardly worthwhile.

$\dagger$ Or, essentially as easily, one could directly rewrite the expression for $\frac{1}{p}$ and produce an interval for the population mean more directly.

Edit: Here it is for completeness' sake. Define $\bar x$ to be the sample mean, $n$ the sample size (the number of geometric(p) values available), and $z$ to be the critical $Z_{\frac{\alpha}{2}}$ value. Further, define:

$A=2(1-\frac{z^2}{n})$

$B=2\bar{x}-\frac{z^2}{n}$

$m=\frac{B}{A}\quad$ (the midpoint)

$h=\sqrt{m^2-2\bar{x}/A}\quad$ (the half-width)

then an approximate $1-\alpha$ interval for $\mu=\frac{1}{p}$ is $(m-h,m+h)$.


In very large samples, one might make a further approximation and substitute $1/\bar{X}$ for $p$ in the denominator of the formula for $Q_A$, which would yield a simpler - and now symmetric - interval for $1/p$. On the data used above I get $(2.63,3.65)$ for that interval.

The fact that there's a fairly big difference from the previous interval suggests that the sample size of n=100 probably wasn't quite large enough in this case to apply the faster-but-even-more-approximate approach. Indeed, the fairly strong lack of symmetry of the earlier interval about 3.14 suggests the same thing.


Small sample: You can probably also do something with the small sample case. e.g. one approach might try to use pivotal quantities, but I haven't tried to check yet if one can construct a suitable pivot in this case. There might not be one.

Pondering a little further, it seems to me that there may be an approach that is similar to the way the chi-squared distribution can be used to give an interval for a Poisson parameter. I believe there's a similar relationship between an incomplete beta integral and the negative binomial (of which the geometric is a special case), so it should be possible to get an interval that way. In particular it suggests that perhaps $F$ tables (or equivalent functions in some package) could then be used to get limits on an interval for $p$, and hence for $\mu=1/p$.