Confidence Intervals using Pivotal Quantities

statistics

I am stuck on this question:

In a study of the amount of contaminants in drinking water, six samples were ran through a lab. The six readings, in parts per million, were $9.8, 9.43, 8.97, 9.33, 9.14,$ and $9.55$. Estimate the population variance $\sigma^2$ for readings by deriving a $90$% confidence interval using a pivotal quantity.

I'm not sure how to approach this question. Finding the sample variance should be straightforward using the six samples, but I'm not sure if I can assume the sample follows a Normal Distribution, as it doesn't state that in the question. Therefore I'm not sure if I can use a pivotal quantity of the Z-Score formula, or $\frac{\bar{x} – \mu}{\frac{\sigma}{\sqrt{n}}}$.

It'll be great if I can get some pointers as to how to determine the pivotal quantity for similar types of questions. Thanks!

Best Answer

For most practical purposes it is not feasible to get a useful CI for $\sigma^2$ from only $n = 6$ observations. My guess is this is a drill problem and you're intended to assume normality of the population and to assume you can get useful information from such a small sample.

Suppose data are normal and both population mean $\mu$ and population variance $\sigma^2$ are unknown, $\mu$ is estimated by $\bar X$ and $\sigma^2$ is estimated by $S^.$ Then a useful pivotal quantity is

$$\frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(\nu = n-1).$$

Suppose you want a 90% confidence interval for $\mu$ based on your $n = 6$ observations. Then the values $L = 1.145$ and $U=11.070$ cut 5% of the probability from the lower and upper tails of the $\mathsf{Chisq}(5),$ respectively. These cut-off values can be found from tables of the chi-squared distribution or by using software. In R statistical software, we have the following computation:

qchisq(c(.05,.95), 5)
[1]  1.145476 11.070498

Thus $$0.90 = P\left(L \le \frac{(n-1)S^2}{\sigma^2} \le U\right) = P\left(\frac{(n-1)S^2}{U} \le \sigma^2 \le \frac{(n-1)S^2}{L}\right),$$ so that the desired 90% confidence interval is of the form $$\left( \frac{(n-1)S^2}{U},\; \frac{(n-1)S^2}{L}\right) = (0.0393, 0.3801).$$

For your data, the computation in R amounts to the following:

x = c(9.8, 9.43, 8.97, 9.33, 9.14, 9.55)
df = length(x) - 1
v = var(x)  
[1] 0.08708
df*v/qchisq(c(.95,.05), df)
[1]  0.03932976 0.38010392

Notice that the point estimate $S^2 = 0.0871$ of $\sigma^2$ is included in this confidence interval. However, owing to the skewness of the distribution $\mathsf{Chisq}(5),$ the point estimate is not at the center of the interval. [That is, unlike a CI for $\mu$ based on the symmetrical t distribution and $\bar X,$ this CI for $\sigma^2$ is not of the form 'point estimate plus or minus a margin of error'.]

If you want a 95% CI for the population standard deviation $\sigma,$ then take square roots of both endpoints of the CI for $\sigma^2.$