Solved – Error on interquartile range

descriptive statisticserrorinterquartilequantilesstandard error

How can I compute the error on the interquartile range of a sample? By error I mean its std deviation (e.g. error on the mean = $\frac{RMS}{\sqrt{N}}$.

The sample is from a unimodal distribution, similar to normal distribution, but with asymmetric tails.

Best Answer

Bootstrapping would probably be a feasible and convenient option (I'll give a short example in R at the end of this answer). In general, the asymptotic distribution of the IQR is normal (see page 327 of DasGupta (2011): "Probability for Statistics and Machine learning: Fundamentals and Advanced Topics"). Let $f$ be the density, $F$ the CDF and the population quantile function be $F^{-1}(p)$ of a random variable. Further, let $F^{-1}(p) = \xi_{p}$. Then, the following holds asymptotically: $$ \sqrt{n}\left(\mathrm{IQR} - \left(\xi_{\frac{3}{4}}-\xi_{\frac{1}{4}}\right)\right)\xrightarrow{d} \mathrm{N}\left(0, \frac{1}{16}\left[\frac{3}{f^{2}(\xi_{\frac{3}{4}})}+\frac{3}{f^{2}(\xi_{\frac{1}{4}})}-\frac{2}{f(\xi_{\frac{1}{4}})f(\xi_{\frac{3}{4}})}\right]\right) $$

For iid observations of a normal distribution $\mathrm{N}(\mu, \sigma^{2})$, this result simplifies to: $$ \sqrt{n}\left(\mathrm{IQR} - 1.349\sigma\right)\xrightarrow{d} \mathrm{N}\left(0, 2.476\sigma^{2}\right) $$. So asymptotically, the standard deviation is $1.573\sqrt{\frac{\sigma^{2}}{n}}$.


Bootstrap

Let's illustrate the bootstrap with an example where the population has an exponential distribution.

#-----------------------------------------------------------------------------
# Load packages
#-----------------------------------------------------------------------------

library(boot)

#-----------------------------------------------------------------------------
# Function used for the bootstrap
#-----------------------------------------------------------------------------

iqr.fun <- function(data, indices) {

  d <- data[indices]  
  iqr <- IQR(d)  
  return(iqr)

}

#-----------------------------------------------------------------------------
# Do the bootstrap with 100000replications
#-----------------------------------------------------------------------------

set.seed(612) # for reproducibility

mysamp <- rexp(100, 1.5) # exponential with rate 1.5

res <- boot(data = mysamp, statistic = iqr.fun, R = 100000)

res

Bootstrap Statistics :
     original      bias    std. error
t1* 0.9985767 -0.01563602   0.1519901

# Confidence intervals

boot.ci(res)

Intervals : 
Level      Normal              Basic         
95%   ( 0.7163,  1.3121 )   ( 0.8015,  1.4591 )  

Level     Percentile            BCa          
95%   ( 0.5381,  1.1956 )   ( 0.5301,  1.1940 )  
Calculations and Intervals on Original Scale

The bootstrap standard error of the IQR is estimated to be $0.152$ and the 95% bias-corrected confidence interval is $\left(0.5301,\;1.1940\right)$. The theoretical IQR of an exponential distribution with $\lambda = 1.5$ is $\frac{\log{(3)}}{\lambda}\approx 0.7324$ which is well within the calculated confidence interval. The theoretical standard error in the exponential case is $2\sqrt{\frac{2}{3}}\sqrt{\frac{1}{n\lambda^{2}}}\approx 0.1089$.

Related Question