Solved – Relationship Between Percentile and Confidence Interval (On a Mean)

confidence intervalmeanquantiles

This question came up at work when someone asked me what the relationship was between a percentile and a confidence interval, and I had a very hard time articulating my thoughts. The context was a very simple question regarding estimating a 95% confidence interval on a sample mean.

I understand that the central limit theorem states that the sampling distribution of the mean of any independent, random variable will be normal or nearly normal, if the sample size is large enough. Thus, the sample mean has a normal distribution $N(\bar{x}, s/\sqrt{n})$ where $s$ is the sample standard deviation.

Now, let's say that the null hypothesis $H_0: \mu_{\bar{x}} = \mu$ is true. Then under the null hypothesis, the 95% confidence interval around the sample mean is $\mu_{\bar{x}} \pm 1.96 * s/\sqrt{n}$

The question from my coworker was specifically the following: the standard error is just the standard deviation of the sampling distribution of the mean. Thus, would $\mu_{\bar{x}} + 1.96 * s/\sqrt{n}$ be equivalent to the 97.5 percentile of a distribution created by calculating the sample means of many samples of size $n$?

The question was really odd to me because percentiles and confidence intervals are two separate concepts and my coworker's question was asking about the relationship between the two, and I got very confused but couldn't articulate my points.

Any help would be greatly appreciated!

Best Answer

Your coworker is correct, confidence intervals are based on the percentiles of the sampling distribution of the statistic of interest. In this case, the statistic is $\hat{\mu}=\frac{1}{n}\sum X_i$. The percentiles of $X$ are different.

You can try yourself to perform your experiment of drawing many $\hat{\mu}_i$ and calculating their percentiles. You will find good agreement with the normal theory formula provided the $n$ for each $\hat{\mu}_i$ is large enough. And if you keep thinking about it, you may end up reinventing the bootstrap, which uses the observed percentiles of $X$ to generate many $\hat{\mu}_i$ and then uses the percentiles of this generated sample to create a confidence interval.