Solved – Confidence interval using Central Limit Theorem

central limit theoremconfidence intervalgeometric-distribution

I've been trying to find this information online, but have not had much success so far. I want to approximate the 95% confidence interval for the geometric distribution with the following parameters:

  • maximum likelihood: $\hat{\theta}=1/4.9\approx 0.204$

  • sample size: $n=100$

How does one approximate the confidence interval using the Central Limit Theorem? For instance, the Wikipedia article has many versions of this theorem, so I don't even know which one I should apply.

Best Answer

It is possible in this case to derive a confidence interval for $\theta$ that takes account of the fact that the parameter affects the variance of this distribution. This can be done by using the central limit theorem to give an approximate pivotal quantity, and then forming a confidence interval for this pivotal quantity directly by algebraic manipulation of a quadratic function that arises in this form.


Deriving the confidence interval: Let $X_1,X_2,X_3, \sim \text{IID Geom}(\theta)$ and note that the moments of this distribution are $\mathbb{E}(X_i) = 1/\theta$ and $\mathbb{V}(X_i) = (1-\theta)/\theta^2$. Applying the central limit theorem therefore gives the following distributional approximation for large $n$:

$$\sqrt{n} \cdot \frac{\theta \bar{X}_n - 1}{\sqrt{1-\theta}} \overset{\text{Approx}}{\sim} \text{N}(0,1).$$

Squaring this quantity gives the more useful pivotal quantity:

$$n \cdot \frac{(\theta \bar{X}_n - 1)^2}{1-\theta} \overset{\text{Approx}}{\sim} \text{ChiSq}(1).$$

We will let $\chi_{1,\alpha}^2$ denote the critical points of the chi-squared distribution with one degree-of-freedom with an upper-tail area of $0<\alpha<1$. We can use the above pivotal quantity for formation of a confidence interval via a quadratic function in $\theta$:

$$\begin{equation} \begin{aligned} 1-\alpha &\approx \mathbb{P} \Bigg( n \cdot \frac{(\theta \bar{X}_n - 1)^2}{1-\theta} \leqslant \chi_{1,\alpha}^2 \Bigg) \\[6pt] &= \mathbb{P} \Bigg( n (\theta \bar{X}_n - 1)^2 \leqslant (1-\theta) \chi_{1,\alpha}^2 \Bigg) \\[6pt] &= \mathbb{P} \Bigg( n \bar{X}_n^2 \theta^2 - (2n \bar{X}_n - \chi_{1,\alpha}^2) \theta + (n - \chi_{1,\alpha}^2) \leqslant 0 \Bigg). \\[6pt] \end{aligned} \end{equation}$$

The quadratic function inside this probability statement has discriminant $\Delta_n = \chi_{1,\alpha}^4 + 4n \chi_{1,\alpha}^2 \bar{X} (\bar{X} - 1)$, and so ---assuming this is positive--- we then have:

$$\begin{equation} \begin{aligned} 1-\alpha &\approx \mathbb{P} \Bigg( \Bigg( \theta - \frac{(2n \bar{X}_n - \chi_{1,\alpha}^2) - \sqrt{\Delta_n}}{2 n \bar{X}_n^2} \Bigg) \Bigg( \theta - \frac{(2n \bar{X}_n - \chi_{1,\alpha}^2) + \sqrt{\Delta_n}}{2 n \bar{X}_n^2} \Bigg) \leqslant 0 \Bigg) \\[6pt] &= \mathbb{P} \Bigg( \Bigg( \theta - \frac{1}{\bar{X}_n} + \frac{\chi_{1,\alpha}^2 + \sqrt{\Delta_n}}{2 n \bar{X}_n^2} \Bigg) \Bigg( \theta - \frac{1}{\bar{X}_n} + \frac{\chi_{1,\alpha}^2 - \sqrt{\Delta_n}}{2 n \bar{X}_n^2} \Bigg) \leqslant 0 \Bigg) \\[6pt] &= \mathbb{P} \Bigg( \frac{1}{\bar{X}_n} + \frac{\chi_{1,\alpha}^2 - \sqrt{\Delta_n}}{2 n \bar{X}_n^2} \leqslant \theta \leqslant \frac{1}{\bar{X}_n} + \frac{\chi_{1,\alpha}^2 + \sqrt{\Delta_n}}{2 n \bar{X}_n^2} \Bigg). \\[6pt] \end{aligned} \end{equation}$$

Hence, we have the confidence interval:

$$\text{CI}_\theta(1-\alpha) \equiv \Bigg[ \frac{1}{\bar{x}_n} + \frac{\chi_{1,\alpha}^2 - \sqrt{\Delta_n}}{2 n \bar{x}_n^2}, \frac{1}{\bar{x}_n} + \frac{\chi_{1,\alpha}^2 + \sqrt{\Delta_n}}{2 n \bar{x}_n^2} \Bigg].$$


Application to your data: In your data you have $n=100$ and $\bar{x}_n = 4.9$. Setting $\alpha = 0.05$ for a 95% confidence interval gives you $\chi_{1,\alpha}^2 = 3.841459$ which then gives:

$$\begin{equation} \begin{aligned} \sqrt{\Delta_n} &= \sqrt{\chi_{1,0.05}^4 + 4 \cdot 100 \cdot \chi_{1,0.05}^2 \cdot 4.9 (4.9 - 1)} \\[6pt] &= \sqrt{3.841459^2 + 400 \cdot 3.841459 \cdot 4.9 \cdot 3.9} \\[6pt] &= \sqrt{29,378.87} = 171.4026. \\[6pt] \end{aligned} \end{equation}$$

Hence, your 95% confidence interval (using the above form) is:

$$\begin{equation} \begin{aligned} \text{CI}_\theta(0.95) &= \Bigg[ \frac{1}{4.9} + \frac{3.841459 - 171.4026}{200 \cdot 4.9^2}, \frac{1}{4.9} + \frac{3.841459 + 171.4026}{200 \cdot 4.9^2} \Bigg] \\[6pt] &= \Bigg[ 0.2040816 -0.03489404, 0.2040816 + 0.03649398 \Bigg] \\[6pt] &= \Bigg[ 0.1691876, 0.2405756 \Bigg]. \\[6pt] \end{aligned} \end{equation}$$

Related Question