The best way to get a good confidence interval is to use everything you know
about the population distribution. In the first three examples below, we
pretend to have successively less information about the population, so that
we can see the effect of knowledge and assumptions on confidence intervals.
Suppose I have $n = 50$ observations from a population safely assumed to be normal, sorted in increasing order, as displayed and summarized below:
sort(x)
[1] 71 72 73 74 80 82 82 83 84 86 86 87 88
[14] 90 91 91 92 93 94 94 95 95 96 96 97 97
[27] 98 99 100 101 101 101 102 103 103 105 105 106 108
[40] 108 110 110 111 111 112 112 118 121 129 131
summary(x); sd(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
71.00 88.50 97.00 97.48 105.75 131.00
[1] 13.62656
Estimating mean, knowing data are normal and knowing population SD:
If I know these data are from a normal distribution with standard deviation
$\sigma = 15,$ then a 95% confidence interval for the population mean $\mu$
is $\bar X \pm 1.96\frac{\sigma}{\sqrt{n}},$ as in your Question. This
computes to $(94.63, 100.33).$
pm = c(-1,1); mean(x) + pm*1.96*sqrt(15/sqrt(50))
[1] 94.62531 100.33469
Estimating mean, assuming normality, but not knowing SD: If I know the data are normal but don't know $\sigma,$ then I can estimate it
by the sample standard deviation $S = 13.63,$ and a confidence interval based
on Student's t distribution with $\nu = 49$ degrees of freedom is
$\bar X \pm t^*\frac{S}{\sqrt{n}}.$ For our data this is the interval
$(93.61, 101.35).$
mean(x) + pm*qt(.975,49)*sd(x)/sqrt(50)
[1] 93.60737 101.35263
Estimating mean, assuming randomly sampled data: If I am not willing to assume the data are normal, I can use a nonparametric
bootstrap procedure: I know the distribution of $D = \bar X - \mu,$ then
I could find cut-off points $L$ and $U$ such that
$$.95 = P(L \le D = L - \bar X \le U) = P(\bar X - U \le \mu \le \bar X - L),$$
so that a 95% CI for $\mu$ would be of the form $(\bar X - U, \bar X - L).$
Not knowing the distribution of $D$ I can use a bootstrap procedure to get
approximate values $L^*$ and $U^*$ of $L$ and $U,$ respectively. Temporarily,
I use $\mu^*= \bar X_{\text{obs}}= 97.48$ as a proxy for $\mu.$ Also, I take
a large number $B$ of re-samples of size $n = 50$ with replacement from the
data, find $\bar X^*$ of each re-sample and use percentiles .025 and .975
of the $B$ values $D^* = \bar X^* - \mu^*$ and use them for $L^*$ and $U^*,$
respectively. Then, returning $\bar X_{obs}$ to its role as the mean of the data,
a 95% nonparametric bootstrap CI for $\mu$ is of the form
$(\bar X_{obs} - U^*, \bar X_{obs} - L^*).$
In the R code below, $*$'s representing bootstrap quantities are shown by .re
.
The bootstrap CI from the run shown is $(93.78,\, 101.18).$ Because this is
a random process subsequent runs (with different seeds) will give slightly
different CIs.
set.seed(819); B = 10000; n = 50; a.obs = mean(x)
d.re = replicate( B, mean(sample(x, n, repl=T))-a.obs )
UL = quantile(d.re, c(.975,.025))
a.obs - UL
97.5% 2.5%
93.78 101.18
It is typical for nonparametric bootstrap confidence intervals to be slightly shorter than
t confidence intervals. The bootstrap CIs are based only on the data, which
does not necessarily capture information about the tails of the normal distribution, which are not 'heavy' but do extend to $\pm\infty.$ [If the data really are normal, we should use a t CI, not a nonparametric bootstrap CI.]
The data were simulated using set.seed(818); x = round(rnorm(50, 100, 15))
,
so we know $\mu = 100,$ and thus that all three of the CIs above happen to
cover the true value of $\mu.$ [Of course, $\mu$ would not be known in a real-world situation.]
Estimating mean of exponential data: Suppose we have $n = 50$ observations
from an exponential population with unknown mean $\mu.$
sort(y)
[1] 1 2 7 7 8 9 10 13 14 21 21 22 23
[14] 24 28 36 45 46 49 52 54 55 60 61 64 65
[27] 66 71 89 91 96 100 128 132 146 152 152 153 159
[40] 162 167 169 174 191 203 236 286 301 456 480
summary(y); sd(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 23.25 64.50 103.14 152.75 480.00
[1] 106.5341
For exponential data, it is known that
$\bar X/\mu \sim \mathsf{Gamma}(\text{shape}=n,\,\text{rate}=n).$ [This result
can be proved using moment generating functions.]
Thus
$$.95 = P\left(L=0.742 \le \frac{\bar Y}{\mu} \le U = 2.296\right)
= P\left(\frac{\bar Y}{U} < \mu < \frac{\bar Y}{L}\right),$$
so that for our data a 95% CI for $\mu$ is $(\bar Y/U,\, \bar Y/L) = (79.60, 138.96).$ [The vector y
was simulated according to $\mathsf{Exp}(\mu=100).]$
qgamma(c(.025,.975), 50, 50)
[1] 0.7422193 1.2956120
mean(y)/qgamma(c(.975,.025), 50, 50)
[1] 79.60717 138.96163
By contrast, if we had taken the data to be normally distributed and used
a t confidence interval the incorrect result would have been $(72.86, 133.42).$
With a sample as large as $n = 50,$ this t CI is perhaps not catastrophically
wrong -- but wrong nevertheless.
Addendum per Comment: The following simulation shows that a
"95%" t CI for $\mu$ based on $n = 50$ exponential observations
covers less than 93% of the time. (With $n = 25,$ coverage is
only about 91%.) It is in that long-run sense that such t CIs are 'incorrect'.
set.seed(819); m = 10^5; n = 50; LCL = UCL = numeric(m)
for (i in 1:m) {
x = rexp(n, .01); a = mean(x); s = sd(x)
LCL[i] = a - 1.96*s/sqrt(n); UCL[i] = a + 1.96*s/sqrt(n) }
mean((LCL < 100) & (UCL > 100))
[1] 0.92803
Best Answer
According to the central limit theorem $\sqrt{n}(\bar X - \mu)/\sigma$ tends to standard normal distribution as $n$ tends to infinity. But by the same token, since $s$ tends to $\sigma$ a.s., the variable $\sqrt{n}(\bar X - \mu)/s$ tends to standard normal distribution as well. Since the $t$-distribution is a sequence of distributions and not the single distribution it's hard to define what we mean by "tends" to. If we mean it in the sense of the Kolmogorov-Smirnov distance or something along these lines, we then indeed can say that it approaches the $t$-distribution. But it's only the consequence of the fact that $t$-distribution approaches standard normal i.e. we could say the same thing for any other sequence of distributions converging to normal. It could happen that $t$-distribution is a better asymptotic approximation of $\sqrt{n}(\bar X - \mu)/s$ than standard normal and it probably is in most cases. However, both can be used in the asymptotic case.