[Math] Order statistics and their CDF

order-statisticsstatistics

Here's a problem that seems rather peculiar to me. This time I have no initial idea about how to solve it.

Let $X_1$, …, $X_n$ be independent, real valued random variables with density $f$ and CDF $F$. Let $F_i$ denote the CDF of $X_{(i)}$.

a) What is the distribution of $F(X_{(i)})$ and $F_i(X_{(i)})$?

b) What is the variance of $F(X_{(i)})$?

[edit: I have realised by now that my approach was flawed and furthermore that my question needs clarification. So here is my second try.]

As I understand the question, we have real valued, i.i.d. random variables $X_1$, …, $X_n$ with density $f$ and CDF $F$.

$X_{(1)}< … < X_{(n)}$ are the corresponding order statistics with CDF $F_i:=F_{X_{(i)}}$.

I know the general formula for the CDF of order statistics. It is given by $$F_i(t)=\sum_{k=i}^n \binom{n}{k}F(t)^k(1-F(t))^{n-k}.$$

Now, the CDF of a random variable is a measurable function. Thus $F(X_{(i)})$ and $F_i(X_{(i)})$ are real valued random variables again. And a) asks for their distribution.

Per definition, we have $$F(t)=\mathbb{P}(X_i\leq t).$$ Hence $$F(X_{(i)})=\mathbb{P}(X_i\leq X_{(i)}).$$ This is the probability of the event that any $X_i$ is less or equal $X_{(i)}$. By definition of order statistics, we have $$F(X_{(n)})=\mathbb{P}(X_i\leq X_{(n)})=1,$$ as $X_{(n)}$ is the maximum of the $X_i$. But how do I derive the distribution of $F(X_{(i)})$ for $i \in \{1, …, n-1\}$?

I think, if they were uniformly distributed, the answer would simply be $$F(X_{(i)})=i/n.$$ But their distribution ist unknown, so I'm stuck and have no idea how to proceed from here.

Best Answer

First recall the most important result in order statistics:

For every random variable $Z$ with continuous CDF $H$, $H(Z)$ is uniform on $(0,1)$.

Thus, if $(X_i)_{1\leqslant i\leqslant n}$ has continuous CDF and $(U_i)_{1\leqslant i\leqslant n}$ is an i.i.d. sample uniform on $(0,1)$, then, for every $1\leqslant i\leqslant n$ and every $x$ in $(0,1)$, $$ P(F_i(X_{(i)})\leqslant x)=P(U_1\leqslant x)=x. $$ And $(F(X_i))_{1\leqslant i\leqslant n}$ is distributed as $(U_i)_{1\leqslant i\leqslant n}$, thus, the CDF $G_i$ of $F(X_{(i)})$ is such that, for every $x$ in $(0,1)$, $$ G_i(x)=P(F(X_{(i)})\leqslant x)=P(U_{(i)}\leqslant x)=\sum_{k=i}^n{n\choose k}t^k(1-t)^{n-k}. $$ Recall a most useful result to compute expectations:

For every $(0,1)$-valued random variable $Z$ with CDF $H$, $E[Z]=\displaystyle\int_0^1(1-H)=1-\int_0^1H$.

Hence, $$ E[U_{(i)}]=1-\sum_{k=i}^n{n\choose k}\int_0^1t^k(1-t)^{n-k}\mathrm dt. $$ Recall now that:

For every $k\leqslant n$, $\displaystyle\int_0^1t^k(1-t)^{n-k}\mathrm dt=\frac1{n+1}{n\choose k}^{-1}$.

Hence, $$ E[U_{(i)}]=1-\sum_{k=i}^n\frac1{n+1}=\frac{i}{n+1}. $$ Recall finally the analogue for second moments of our result for expectations:

For every $(0,1)$-valued random variable $Z$ with CDF $H$, $\displaystyle E[Z^2]=\int_0^12x(1-H(x))\mathrm dx$, that is, $\displaystyle E[Z^2]=1-2\int_0^1xH(x)\mathrm dx.$

Hence, $$ E[U_{(i)}^2]=1-2\sum_{k=i}^n{n\choose k}\int_0^1t^{k+1}(1-t)^{n-k}\mathrm dt=1-2\sum_{k=i}^n{n\choose k}\frac1{n+2}{n+1\choose k+1}^{-1}, $$ that is, $$ E[U_{(i)}^2]=1-\frac2{(n+2)(n+1)}\sum_{k=i}^n(k+1)=\frac{i(i+1)}{(n+1)(n+2)}, $$ from which you can probably guess an expression of $E[U_{(i)}^k]$ valid for every nonnegative integer $k$, and from which, independently, one deduces that $$ \mathrm{var}(F(X_{(i)}))=\mathrm{var}(U_{(i)})=\frac{i(i+1)}{(n+1)(n+2)}-\left(\frac{i}{n+1}\right)^2=\frac{i(n+1-i)}{(n+1)^2(n+2)}. $$

Related Question