Solved – Confidence interval for quantiles: distribution-free, asymptotic and assuming a normal distribution

asymptoticsconfidence intervalexact-testquantiles

An acquaintance of mine has been using this wrong inference formula for years: given

  • a i.i.d. sample $\mathbf{X}={X_1,\dots,X_N}$ for a continuous RV $X$,
  • sample mean $\bar{X}=\frac{\sum X_i}{N}$ and sample standard deviation $\bar{\sigma}=\frac{\sum \left(X_i-\bar{X}\right)^2}{N-1}$

estimate the 0.95-quantile $q_{0.95}$ as

$$q_{0.95} = \bar{X} + 2 \bar{\sigma}$$

(which is not even a decent point estimate – you should at the very least use $q_{0.95} = \bar{X} + 1.645\bar{\sigma}$).

What are the correct confidence intervals for a generic $q$, in the three cases:

  • we know nothing on $X$ (apart from the fact that it's continuous), and we look for an exact (non-asymptotic) answer. I think this answer for the median could be modified for a generic quantile
  • as before, but an asymptotic solution is fine. I guess there should be at least a couple answers here…one for quantiles which aren't close to 0 or 1, and one for quantiles which are. Maybe one based on normal approximation and one based on Poisson?
  • finally, we assume $X$ to have a Gaussian distribution with unknown mean and variance.

Best Answer

  • The first case was answered in detail in this question.
  • One example of the second case is shown here, where the authors apply a normal approximation to the binomial distribution used in calculations of the first case.

The third case is given by Hahn and Meeker in their handbook Statistical Intervals (2nd ed., Wiley 2017):

A two-sided $100(1-\alpha)\%$ confidence interval for $x_q$, the $q$ quantile of the normal distribution, is

$$ \left[\bar{x}-t_{(1-\alpha/2;\,n-1,\,\delta)}\frac{s}{\sqrt{n}},\;\bar{x}-t_{(\alpha/2;\,n-1,\,\delta)}\frac{s}{\sqrt{n}}\right] $$ where $t_{(\gamma;\,n-1,\,\delta)}$ is the $\gamma$ quantile of a noncentral $t$-distribution with $n-1$ degrees of freedom and noncentrality parameter $\delta = -\sqrt{n}z_{(q)}=\sqrt{n}z_{(1-p)}$.

Here, $z_{(q)}$ denotes $\Phi^{-1}(q)$, the $q$ quantile of the standard normal distribution.

For example, let's assume we drew $n=20$ samples from a normal distribution with unknown mean and standard deviation. The sample mean was $\bar{x}=10.5$ and the sample standard deviation was $s=3.19$. Then, the two-sided $95\%$ confidence interval for the $q=0.25$ quantile $x_{0.25}$ would be given by $(6.42; 9.76)$.

Here is some R code and a small simulation to check the coverage. By changing the parameters, you can run your own simulations:

normquantCI <- function(x, conf_level = 0.95, q = 0.5) {
  
  x <- na.omit(x)
  n <- length(x)
  xbar <- mean(x)
  s <- sd(x)
  ncp <- -sqrt(n)*qnorm(q)
  tval <- qt(c((1 + conf_level)/2, (1 - conf_level)/2), n - 1, ncp)
  se <- s/sqrt(n)
  
  xbar - tval*se
  
}

# Simulate the coverage

set.seed(142857)

q <- 0.25 # Quantile to calculate the CI for
conf_level <- 0.95 # Confidence level
true_mean <- 100 # The true mean of the normal distribution
true_sd <- 15 # True sd of the normal distribution
sampsi <- 20 # The sample size

trueq <- qnorm(q, true_mean, true_sd) # The true quantile

res <- replicate(1e5, {
  citmp <- normquantCI(rnorm(sampsi, true_mean, true_sd), conf_level = conf_level, q = q)
  ifelse(citmp[1] < trueq & citmp[2] > trueq, 1, 0)
})

sum(res)/length(res)
[1] 0.95043