Normal Distribution – Understanding the Variance of Normal Order Statistics

approximationdistributionsnormal distributionorder-statisticsvariance

Suppose we have $X_1, \cdots, X_n \overset{\textrm{i.i.d.}}{\sim} \mathcal{N}(0, 1)$ with $n > 50$, and let $X_{(1)}, \cdots, X_{(n)}$ be the associated order statistics.

Are there any references pointing to formulas that specify (or estimate) the variance of the $k^{\textrm{th}}$ order statistic, i.e. $\textrm{Var}(X_{(k)})$, for $1 \leq k \leq n$?

I am looking for formulas can be explicitly evaluated without resorting back to numerical integration.


I have seen this question: Approximate order statistics for normal random variables, which attracted a number of answers on approximate formula for the mean of the $k^{\textrm{th}}$ order statistic, e.g. Blom (1958) (and modifications based on work by Harter (1961) and Elfving (1947)).

Searching on Google also yields a number of work on the variance / standard deviation for the $k^\textrm{th}$ order statistics for small $n$, all of them opting for a computation/tabulation route. They are: Godwin(1949), Teichroew (1956), and Parrish (1992).

I have also attempted to derive something myself following the method sketched by @probabilityislogic in this question, along the lines of:

$$\textrm{Var}(X_{(k)}) = \int_{-\infty}^{\infty} \left(x-\mathbb{E}(X)\right)^2 \frac{n!}{(k-1)!(n-k)!} f_X(x)[1-F_X(x)]^{n-k}[F_X(x)]^{k-1}\;\textrm{d}x,$$
with lower case $f$ denoting the PDF of a r.v. $X$, and upper case $F$ denoting the CDF.

Noting $x = F^{-1}(F_X(x))$ and using the substitution $u = F_X(x)$ (with corresponding r.v. in upper case $U$), we have:
$$\textrm{Var}(X_{(k)}) = \int_{0}^{1} \left(F_X^{-1}(u) – \mathbb{E}\left(F_X^{-1}(U)\right)\right)^2\,\mathcal{B}(u | k, n-k+1) \; \textrm{d}u,$$
where $\mathcal{B}$ denotes the PDF of a beta distribution (not the beta function). This can be rewritten as the variance based on the beta distribution:

$$\textrm{Var}(X_{(k)}) = \textrm{Var}_{\mathcal{B}(u | k, n-k+1)}\left(F_X^{-1}(u)\right).$$

Approximating the RHS using (the variance bit of the) delta method, we get:
$$\textrm{Var}_{\mathcal{B}(u | k, n-k+1)}\left(F_X^{-1}(u)\right) \approx \textrm{Var}_{\mathcal{B}(u | k, n-k+1)}(U) \cdot \left[(F_X^{-1})'\left(\mathbb{E}_{\mathcal{B}(u | k, n-k+1)}(U)\right)\right]^2,$$
where the prime denotes the derivative.

The first term of the product is standard result. For the second part, given $(F^{-1})'(\cdot)=\frac{1}{f(F^{-1})(\cdot)}$ (see e.g. this), we then have:
$$\textrm{Var}_{\mathcal{B}(u | k, n-k+1)}\left(F_X^{-1}(u)\right) \approx \frac{k(n-k+1)}{(n+1)^2(n+2)} \frac{1}{\left(f\left(F^{-1}\left(\frac{k}{n+1}\right)\right)\right)^2}.$$

Substituting in the standard normal distribution ($\phi$ for PDF, $\Phi$ for CDF) and scaling it to the desired variance $\sigma^2$ we arrive at:
$$\textrm{Var}(X_{(k)}) \approx \frac{k(n-k+1)}{(n+1)^2(n+2)} \frac{1}{\left(\phi\left(\Phi^{-1}\left(\frac{k}{n+1}\right)\right)\right)^2},$$
which we can throw in a $\sigma^2$ term in front if it is not a standard normal.

But surely someone must have derived that somewhere (again I am looking for some references), and there is a high chance that I have gone off track somewhere above.

Best Answer

I found someone had indeed provided the approximation above. It is in page 120 of their book [1] and page 12 of their accompanying course material [2].

I believe the result is first presented systematically by David and Johnson [3], which included higher order terms. Section 4.6 of David and Nagaraja's book [4] provides a more accessible explanation on David and Johnson's results in my opinion.

The author of [1,2] stated that the variance of the $k^{\textrm{th}}$ order statistics can be estimated as:

$$\textrm{Var}(X_{(k)}) \approx \frac{p(1-p)}{(n+2)(f(\theta))^2},$$

where $f(\cdot)$ is the PDF of $X$, $p = \frac{k}{n+1}$, and $\theta$ is the $p^\textrm{th}$ quantile of the distribution.

Applying to the normal case, we have $\theta = \Phi^{-1} (\frac{k}{n+1})$, and one can easily verify the referenced variance estimate equates to the variance derived in the original question with some rearrangement of terms.

[1] Jenny A. Baglivo (2005) Mathematica laboratories for mathematical statistics: Emphasizing simulation and computer intensive methods.
[2] Jenny A. Baglivo (2018) MATH4427 Notebook 4 - Fall Semester 2017/2018 - Boston College. URL: https://www2.bc.edu/jenny-baglivo/MT427/notebook04.pdf
[3] F. N. David and N. L. Johnson (1954) Statistical treatment of censored data: Part I. fundamental formulae. Biometrika, vol. 41, pp. 228–240.
[4] H. A. David and H. N. Nagaraja (2004) Order statistics. Encyclopedia of Statistical Sciences.