My question is similar to this one:
Approximate order statistics for normal random variables
I am looking to find a formula for the variability of an arbitrary percentile of a normal distribution. The question cited does not quite solve it:
– that is concerned with min and max only
– I don't understand what n is
In particular, I am looking for the variability of the 99.5th percentile of a sample from a normal distribution.
Please, could you point me to a formula (with a reference) or a referred paper.
Edits per Glen:
a) How I am collecting the samples seems irrelevant: the stdDev sample mean from normal dist is $\sigma / \sqrt{n}$ … irrespective of anything. Should the variance of a percentile be independent of method of collection.
b) A direct derivation might be acceptable – given how quickly/frequently URLs change, I don't know whether it is sensible to cite stats.stackexchange as a source.
Best Answer
I will try to answer without formulas many formulas because when we talk about percentile (you can google ORDER STATISTICS) the pdfs become pretty messy. I just to give you the main concepts.
We are talking about estimators of a quantile, in your case the 99.5th percentile.
Estimators are Random Variables and hence have moments.
You want to find the Variance of the sample 99.5th percentile from a Normal RV
The most rigorous approach I think is to evaluate an integral that is:
Suppose we call $T$ the estimator of 99.5th percentile from a Normal RV:
$\sigma^2(T) = {\displaystyle \int_{-\infty}^{+\infty} } \big(T-E(T)\big)^2f_T(t)dx$
where $E(T) = \mu(T) = {\displaystyle \int_{-\infty}^{+\infty} } Tf_T(t)dx$
As I said before $f_T(t)$ is pretty messy and you won't be able to find a close formula for the integral. Consequently you are going to evaluate the integral numerically. Just to give you an idea of the generic pdf for the $k$th ORDER STATISTIC here is what you get:
$f_{T_{(k)}}(t) =\frac{n!}{(k-1)!(n-k)!}[F_T(t)]^{k-1}[1-F_T(t)]^{n-k} f_T(t)$
So what should you do? In you question talk about approximation.
The easiest way to go about this is bootstrap. The steps are simple if we want a non sophisticated way to get some results:
From your ORIGINAL SAMPLE of size n calculate $\hat{\mu}$ the sample mean and ** $\hat{\sigma^2}$ the sample variance**.
Calculate the 99.5th percentile from the original sample.
Resample as many times as you want a sample of size n from a Normal distribution with mean $\hat{\mu}$ and variance $\hat{\sigma^2}$.
For each resample calculate the sample 99.5th percentile and store it in a vector.
Calculate the sample variance of this vector.
This is your approximate variance for the 99.5th percentile.