I am stuck on this question:
In a study of the amount of contaminants in drinking water, six samples were ran through a lab. The six readings, in parts per million, were $9.8, 9.43, 8.97, 9.33, 9.14,$ and $9.55$. Estimate the population variance $\sigma^2$ for readings by deriving a $90$% confidence interval using a pivotal quantity.
I'm not sure how to approach this question. Finding the sample variance should be straightforward using the six samples, but I'm not sure if I can assume the sample follows a Normal Distribution, as it doesn't state that in the question. Therefore I'm not sure if I can use a pivotal quantity of the Z-Score formula, or $\frac{\bar{x} – \mu}{\frac{\sigma}{\sqrt{n}}}$.
It'll be great if I can get some pointers as to how to determine the pivotal quantity for similar types of questions. Thanks!
Best Answer
For most practical purposes it is not feasible to get a useful CI for $\sigma^2$ from only $n = 6$ observations. My guess is this is a drill problem and you're intended to assume normality of the population and to assume you can get useful information from such a small sample.
Suppose data are normal and both population mean $\mu$ and population variance $\sigma^2$ are unknown, $\mu$ is estimated by $\bar X$ and $\sigma^2$ is estimated by $S^.$ Then a useful pivotal quantity is
$$\frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(\nu = n-1).$$
Suppose you want a 90% confidence interval for $\mu$ based on your $n = 6$ observations. Then the values $L = 1.145$ and $U=11.070$ cut 5% of the probability from the lower and upper tails of the $\mathsf{Chisq}(5),$ respectively. These cut-off values can be found from tables of the chi-squared distribution or by using software. In R statistical software, we have the following computation:
Thus $$0.90 = P\left(L \le \frac{(n-1)S^2}{\sigma^2} \le U\right) = P\left(\frac{(n-1)S^2}{U} \le \sigma^2 \le \frac{(n-1)S^2}{L}\right),$$ so that the desired 90% confidence interval is of the form $$\left( \frac{(n-1)S^2}{U},\; \frac{(n-1)S^2}{L}\right) = (0.0393, 0.3801).$$
For your data, the computation in R amounts to the following:
Notice that the point estimate $S^2 = 0.0871$ of $\sigma^2$ is included in this confidence interval. However, owing to the skewness of the distribution $\mathsf{Chisq}(5),$ the point estimate is not at the center of the interval. [That is, unlike a CI for $\mu$ based on the symmetrical t distribution and $\bar X,$ this CI for $\sigma^2$ is not of the form 'point estimate plus or minus a margin of error'.]
If you want a 95% CI for the population standard deviation $\sigma,$ then take square roots of both endpoints of the CI for $\sigma^2.$