Normal Distribution – Understanding the Variance of Normal Order Statistics

approximationdistributionsnormal distributionorder-statisticsvariance

Suppose we have $X_1, \cdots, X_n \overset{\textrm{i.i.d.}}{\sim} \mathcal{N}(0, 1)$ with $n > 50$, and let $X_{(1)}, \cdots, X_{(n)}$ be the associated order statistics.

Are there any references pointing to formulas that specify (or estimate) the variance of the $k^{\textrm{th}}$ order statistic, i.e. $\textrm{Var}(X_{(k)})$, for $1 \leq k \leq n$?

I am looking for formulas can be explicitly evaluated without resorting back to numerical integration.

I have seen this question: Approximate order statistics for normal random variables, which attracted a number of answers on approximate formula for the mean of the $k^{\textrm{th}}$ order statistic, e.g. Blom (1958) (and modifications based on work by Harter (1961) and Elfving (1947)).

Searching on Google also yields a number of work on the variance / standard deviation for the $k^\textrm{th}$ order statistics for small $n$, all of them opting for a computation/tabulation route. They are: Godwin(1949), Teichroew (1956), and Parrish (1992).

I have also attempted to derive something myself following the method sketched by @probabilityislogic in this question, along the lines of:

$$\textrm{Var}(X_{(k)}) = \int_{-\infty}^{\infty} \left(x-\mathbb{E}(X)\right)^2 \frac{n!}{(k-1)!(n-k)!} f_X(x)[1-F_X(x)]^{n-k}[F_X(x)]^{k-1}\;\textrm{d}x,$$
with lower case $f$ denoting the PDF of a r.v. $X$, and upper case $F$ denoting the CDF.

Noting $x = F^{-1}(F_X(x))$ and using the substitution $u = F_X(x)$ (with corresponding r.v. in upper case $U$), we have:
$$\textrm{Var}(X_{(k)}) = \int_{0}^{1} \left(F_X^{-1}(u) – \mathbb{E}\left(F_X^{-1}(U)\right)\right)^2\,\mathcal{B}(u | k, n-k+1) \; \textrm{d}u,$$
where $\mathcal{B}$ denotes the PDF of a beta distribution (not the beta function). This can be rewritten as the variance based on the beta distribution:

$$\textrm{Var}(X_{(k)}) = \textrm{Var}_{\mathcal{B}(u | k, n-k+1)}\left(F_X^{-1}(u)\right).$$

Approximating the RHS using (the variance bit of the) delta method, we get:
$$\textrm{Var}_{\mathcal{B}(u | k, n-k+1)}\left(F_X^{-1}(u)\right) \approx \textrm{Var}_{\mathcal{B}(u | k, n-k+1)}(U) \cdot \left[(F_X^{-1})'\left(\mathbb{E}_{\mathcal{B}(u | k, n-k+1)}(U)\right)\right]^2,$$
where the prime denotes the derivative.

The first term of the product is standard result. For the second part, given $(F^{-1})'(\cdot)=\frac{1}{f(F^{-1})(\cdot)}$ (see e.g. this), we then have:
$$\textrm{Var}_{\mathcal{B}(u | k, n-k+1)}\left(F_X^{-1}(u)\right) \approx \frac{k(n-k+1)}{(n+1)^2(n+2)} \frac{1}{\left(f\left(F^{-1}\left(\frac{k}{n+1}\right)\right)\right)^2}.$$

Substituting in the standard normal distribution ($\phi$ for PDF, $\Phi$ for CDF) and scaling it to the desired variance $\sigma^2$ we arrive at:
$$\textrm{Var}(X_{(k)}) \approx \frac{k(n-k+1)}{(n+1)^2(n+2)} \frac{1}{\left(\phi\left(\Phi^{-1}\left(\frac{k}{n+1}\right)\right)\right)^2},$$
which we can throw in a $\sigma^2$ term in front if it is not a standard normal.

But surely someone must have derived that somewhere (again I am looking for some references), and there is a high chance that I have gone off track somewhere above.

Best Answer

I found someone had indeed provided the approximation above. It is in page 120 of their book [1] and page 12 of their accompanying course material [2].

I believe the result is first presented systematically by David and Johnson [3], which included higher order terms. Section 4.6 of David and Nagaraja's book [4] provides a more accessible explanation on David and Johnson's results in my opinion.

The author of [1,2] stated that the variance of the $k^{\textrm{th}}$ order statistics can be estimated as:

$$\textrm{Var}(X_{(k)}) \approx \frac{p(1-p)}{(n+2)(f(\theta))^2},$$

where $f(\cdot)$ is the PDF of $X$, $p = \frac{k}{n+1}$, and $\theta$ is the $p^\textrm{th}$ quantile of the distribution.

Applying to the normal case, we have $\theta = \Phi^{-1} (\frac{k}{n+1})$, and one can easily verify the referenced variance estimate equates to the variance derived in the original question with some rearrangement of terms.

[1] Jenny A. Baglivo (2005) Mathematica laboratories for mathematical statistics: Emphasizing simulation and computer intensive methods.
[2] Jenny A. Baglivo (2018) MATH4427 Notebook 4 - Fall Semester 2017/2018 - Boston College. URL: https://www2.bc.edu/jenny-baglivo/MT427/notebook04.pdf
[3] F. N. David and N. L. Johnson (1954) Statistical treatment of censored data: Part I. fundamental formulae. Biometrika, vol. 41, pp. 228–240.
[4] H. A. David and H. N. Nagaraja (2004) Order statistics. Encyclopedia of Statistical Sciences.

Related Solutions

Approximate Order Statistics for Normal Random Variables

The classic reference is Royston (1982)[1] which has algorithms going beyond explicit formulas. It also quotes a well-known formula by Blom (1958): $E(r:n) \approx \mu + \Phi^{-1}(\frac{r-\alpha}{n-2\alpha+1})\sigma$ with $\alpha=0.375$. This formula gives a multiplier of -2.73 for $n=200, r=1$.

[1]: Algorithm AS 177: Expected Normal Order Statistics (Exact and Approximate) J. P. Royston. Journal of the Royal Statistical Society. Series C (Applied Statistics) Vol. 31, No. 2 (1982), pp. 161-165

Expected Value of Minimum Order Statistic – Expected Value of Minimum Order Statistic from a Normal Sample

Your results do not appear correct. This is easy to see, without any calculation, because in your table, your $E[X_{(1)}]$ increases with sample size $n$; plainly, the expected value of the sample minimum must get smaller (i.e. become more negative) as the sample size $n$ gets larger.

The problem is conceptually quite easy.

In brief: if $X$ ~ $N(0,1)$ with pdf $f(x)$:

... then the pdf of the 1st order statistic (in a sample of size $n$) is:

... obtained here using the OrderStat function in mathStatica, with domain of support:

Then, $E[X_{(1)}]$, for $n = 1,2,3$ can be easily obtained exactly as:

The exact $n = 3$ case is approximately $-0.846284$, which is obviously different to your workings of -1.06 (line 1 of your Table), so it seems clear something is wrong with your workings (or perhaps my understanding of what you are seeking).

For $n \ge 4$, obtaining closed-form solutions is more tricky, but even if symbolic integration proves difficult, we can always use numerical integration (to arbitrary precision if desired). This is really very easy ... here, for instance, is $E[X_{(1)}]$, for sample size $n = 1$ to 14, using Mathematica:

 sol = Table[NIntegrate[x g, {x, -Infinity, Infinity}], {n, 1, 14}]

{0., -0.56419, -0.846284, -1.02938, -1.16296, -1.26721, -1.35218, -1.4236, -1.48501, -1.53875, -1.58644, -1.62923, -1.66799, -1.70338}

All done. These values are obviously very different to those in your table (right hand column).

To consider the more general case of a $N(\mu, \sigma^2)$ parent, proceed exactly as above, starting with the general Normal pdf.

Best Answer

Related Solutions

Approximate Order Statistics for Normal Random Variables

Expected Value of Minimum Order Statistic – Expected Value of Minimum Order Statistic from a Normal Sample

Related Question