Solved – Calculating Confidence Interval

confidence intervalmeanself-study

How would one go solving for this?

You are told that a 95% CI for expected lead content when traffic flow is 15, based on a sample of n = 12 observations, is (468.1, 591.7). Calculate a CI with confidence level 99% for expected lead content when traffic flow is 15. (Round your answers to one decimal place.)

Thanks

Best Answer

A $c$% confidence interval for data $x_1,\ldots,x_n$ is defined as $$ (\bar{x} - q_c\frac{s}{\sqrt{n}}, \bar{x} + q_c\frac{s}{\sqrt{n}})$$ where

$n$ is the number of samples,
$s$ is the sample standard deviation, and
$q_c$ is the appropriate quantile, the $(100-(100-c)/2)$th percentile of an appropriate distribution (usually the normal or t distribution, see below). In the case of $c=95\%$, $q_c$ is the $100-(5/2)=97.5$th percentile.

Assuming your data is normally distributed, you can use the t distribution with $n-1$ degrees of freedom for the quantiles: you can look the quantiles up in tables or use a software package.

Defining the standard error as $\sigma_M=s/\sqrt{n}$, the confidence interval can be written $$ \;(\bar{x} - q_c\sigma_M, \bar{x} + q_c\sigma_M)$$

For further background search for "t confidence intervals". Khan academy is good and wikipedia is worth a look too.

That's the theory. To solve your problem what you need to do is:

Figure out what $q_c$ is for the given confidence interval with $c=95\%$ and $n=12$, looking up using the appropriate t distribution.
Given the width of the confidence interval, and the distance either side of the mean to the boundaries, solve for $\sigma_M$ and $\bar{x}$.
Figure out what $q_c$ is for the required confidence interval with $c=99\%$ in a similar way to what you did in step 1.
Use your new $q_c$ and $\sigma_M$, $\bar{x}$ to calculate the new confidence interval.

Related Solutions

Solved – Expectation of length of a confidence interval

I'm aware that this is a 'homework' question but it has gone unanswered for over six months so I figure the homework has been turned in by now. Also, the hints in the comments (which I use directly here) are useful until you get to the point where you have to calculate the expected value of the sample standard deviation (a non-trivial exercise), which I give a link to in this answer.

For a sample $X_1, X_2, ..., X_n$ from a $N(\mu,\sigma^2)$ population, the $95 \%$ confidence interval for $\mu$ when the variance $\sigma^2$ is unknown is

$$ \overline{X} \pm t_{n-1} \cdot \frac{s}{\sqrt{n}} $$

where $\overline{X} = \frac{1}{n} \sum_{i=1}^{n} X_i$ and $ s = \sqrt{ \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \overline{X})^2}$ and $t_{n-1}$ is the $97.5$th quantile of the $t$-distribution with $n-1$ degrees of freedom. From the expression for the confidence interval, we can see its width is the random variable $$W = 2t_{n-1} \cdot \frac{s}{\sqrt{n}}$$ The only random part of $W$ is $s$, therefore the expected width is $ E(W) = \frac{2t_{n-1}}{\sqrt{n}} \cdot E(s) $, which reduces the problem to that of calculating $E(s)$ (which is calculated in this thread) and doing some algebra:

$$ E(W) = \sigma \cdot t_{n-1} \cdot \sqrt{ \frac{8}{n(n-1)} } \cdot \frac{ \Gamma(n/2) }{\Gamma( \frac{n-1}{2} ) } $$

From that formula, we can plug in $n=9$ and find that $ E(W) \approx 1.49 \sigma $. We can also plot $E(W)$ as a function of $n$:

$\ \ \ \ \ \ \ \ \ $ enter image description here

Solved – Calculating a confidence interval and required sample size for the mean time to next event of a Poisson process

The first thing is that with a constant intensity Poisson-process with rate $\lambda$ per unit time, the time between events has an exponential distribution with mean $\mu=1/\lambda$.

Second, you actually ask two separate questions, so let's deal with them one at a time.

if I calculate the average time to occurrence based on 10 such events, how do I calculate the upper and lower error bound (in other words, the confidence interval) of the estimated average.

If you average ten such events, the mean is distributed as $\text{gamma}(10,\mu/10)$ (this is the gamma parameterized in the shape-scale form, not the shape-rate form)

If you want a $1-\alpha$ interval for the population mean, then one approach is to construct a pivotal quantity for the mean, which is to say a quantity $Q$ which is a function of the data and the parameter whose distribution doesn't depend on the parameter.

Note that $\hat{\mu}/\mu$ will also have a gamma distribution, it will be distributed as $\text{gamma}(10,1/10)$, so that is a pivotal quantity (and works just fine for this), but it may be easier to work with its inverse $\mu/\hat{\mu}$, which will be distributed as an inverse-gamma distribution, specifically $\text{inverse-gamma}(10,1/10)$.

Similarly, for a sample of size $n$, the distribution of $\mu/\hat{\mu}$ will be $\text{inverse-gamma}(n,1/n)$.

(The inverse chi-square is a particular case of this distribution, and it, too can be used for this purpose, with a suitable rescaling. So the advice you were given was correct, if not immediately helpful to you.)

(I'll assume you want a symmetric (in probability) interval.)

Let $l$ be the $\alpha/2$ quantile of an $\text{inverse-gamma}(n,1/n)$ distribution, and let $u$ be the $1-\alpha/2$ quantile - that is, the values that cut off a lower and upper tail area of $\alpha/2$.

Then $P(l<Q<u) = 1-\alpha$, so a $1-\alpha$ interval for $Q$ is $(l,u)$. Hence we can work out the interval for $\mu$:

$$l<\mu/\hat{\mu}<u$$

$$l\,\hat{\mu}<\mu<u\,\hat{\mu}$$

where $\hat{\mu}=\bar x$, which is to say a $1-\alpha$ interval for $\mu$ is $(l\,\bar x,u\, \bar x)$

Imagine the sample size was 10, and that the mean time was 3.21 seconds. Further, imagine you wanted a 95% interval.

I'm going to work in R, but anything that will give you gamma, inverse gamma, chi-square or inverse-chi-square quantiles will work.

So here's the 0.025 and 0.975 quantiles of a gamma(10,1/10):

 qgamma(.025,10,scale=.1)
[1] 0.4795389
> qgamma(.975,10,scale=.1)
[1] 1.70848

The corresponding cutoffs for an inverse gamma will be the inverses of these, flipped around:

> 1/qgamma(.975,10,scale=.1)
[1] 0.5853155
> 1/qgamma(.025,10,scale=.1)
[1] 2.085337

So that's our $l$ and $u$ respectively.

Hence a 95% interval for $\mu$ in this example is $(3.21\times 0.5853, 3.21\times 2.0853)$, or $(1.879, 6.694)$

That interval is far bigger that $\pm 20\%$, of course.

If you want to do it in terms of an inverse-chi-square instead, note that a $χ²(\nu)$ is a $\text{gamma}(\nu/2, 1/2)$; from there it's reasonably easy to convert the above calculations.

if I want the estimated average time to occurrence to be within ±20% of the actual mean based on a confidence level, how many samples do I need to record?

This is basically the inverse of the above problem, where we specify information about the interval and solve for $n$

Edit: the approach I took here before didn't quite solve the problem as stated, but a related problem (I solved the true mean being within 20% of the estimates). This version should be right:

An interval that's like this: $0.8 \mu <\hat \mu < 1.2 \mu$

is equivalent to (after dividing by $\mu\,\hat\mu$ and inverting):

$0.833 \hat\mu < \mu < 1.25 \hat\mu$

which is in the right form to play with.

So our interval is $l_n \hat{\mu}<\mu<u_n \hat{\mu}$.

You want to choose $n$ that $u_n>1.25$ and that $l_n<0.833$.

Unfortunately, the distribution of a $\text{gamma}(n,1/n)$ (and, correspondingly, and inverse gamma) is different for every $n$.

One could of course proceed by trial and error and this will work perfectly well, if sometimes being a little tedious.

Half a dozen guesses for the lower limit gets us to:

> n=86;1/qgamma(.025,n,scale=1/n)
[1] 1.250202
> n=87;1/qgamma(.025,n,scale=1/n)
[1] 1.248503

While for the other end:

> n=105;1/qgamma(.975,n,scale=1/n)
[1] 0.8332447
> n=106;1/qgamma(.975,n,scale=1/n)
[1] 0.8339306

Which means we need 106 observations to be within the desired limit on both ends and get the desired coverage.

Note that for the lower end we needed a larger sample size to achieve the desired width. [If you are willing to use an asymmetric interval (asymmetric in probability, I mean), in order to get a symmetric-in-percentage-error interval -- i.e. to be substantially more likely to be wrong out one end of the interval than the other -- then you can achieve a smaller sample size while still getting $1-\alpha$ coverage.]

Computing an asymmetric interval

If we try different values of $n$, we can find how much tail area is cut off either side of the +/-20%, and then find the smallest $n$ that totals to no more than $\alpha$ in the tails. In our example, the first place to try is to 'split the difference' and try $n=96$:

 n=96;pgamma(.8,n,scale=1/n);pgamma(1.2,n,scale=1/n,lower.tail=FALSE)
[1] 0.01907025
[1] 0.03033813

Those two tail areas give just under 0.05. Trying n=95 gives more than 0.05 in the tails combined, so n=96 is as small as you can go.

(Note that the 0.03033 value corresponds to the lower tail for the inverse gamma, but it's the probability of the estimate being at least 20% too high)

So if you are happy with more probability of being too high than too low, you can do n=96 for a 95% interval. [A higher coverage ($1-\alpha$) would give a larger $n$, and a lower coverage would give a smaller $n$.]

Simulation example in R to show that this approach is doing what it should. In this example, we calculate the limits ($l$ and $u$) for a symmetric-in-probability (2.5% in each tail) interval based on the 106 just computed:

n=106;l=1/qgamma(.975,n,scale=1/n);u=1/qgamma(.025,n,scale=1/n)
l;u
[1] 0.8339306
[1] 1.221422

Since $l$ and $u$ are within $(0.8333, 1.25)$ if the true mean lies inside that interval the estimate can't be further than 20% from the true mean and by construction, 95% of the time the generated interval includes the true mean, then at least 95% of the time we generate such an interval, the estimate must be within 20% of the true mean.

Now we simulate ten thousand intervals to show it does have the required coverage:

sum(replicate(10000,
    isTRUE(all.equal(c(l,u)*mean(rexp(106,rate=1/3.21))<3.21,c(TRUE,FALSE)))))
[1] 9507

- looks like it's right.

To avoid (or at least greatly reduce) the trial and error (not that it takes more than a few moments), we can make use of an approximation that will get us very close to the right answer - the Wilson-Hilferty transformation:

If $X \sim χ²(k)$ then $\scriptstyle\sqrt[3]{X/k}$ is approximately normally distributed with mean $\scriptstyle 1-2/(9k)$ and variance $\scriptstyle 2/(9k)$.

Best Answer

Related Solutions

Solved – Expectation of length of a confidence interval

Solved – Calculating a confidence interval and required sample size for the mean time to next event of a Poisson process

Related Question