# Order Statistics – How to Show $P(X_k \le x) = P\{N(x) \ge k\}$

mathematical-statisticsorder-statisticsself-study

Suppose $$x_1 … x_n$$ are the order statistics of an iid sample from a continuous distribution $$F(x)$$. Show that $$P(X_k \le x) = P\{N(x) \ge k\}$$ where $$N(x)$$, the number of sample values less than x, is binomial with parameters $$n$$ and probability $$p = F(x)$$.

Given the statement looks (i believe it to be) like this:
$$P(X_k \le x) = \sum_{k=0}^n\binom{n}{k}F(x)^k(1-F(x))^{n-k}$$

However, I do not see how $$\sum_{k=0}^n\binom{n}{k}F(x)^k(1-F(x))^{n-k} = P\{N(x) \ge k\}$$

Use the Q above to show that the density of $$X_k$$ is
$$p(x) = \binom{n}{k}F(x)^{k-1}(1-F(x))^{n-k}f(x)$$ where $$f(x)$$ is the density from $$F(x)$$. Verify the density using the multinomial argument:

$$n!\epsilon^n\prod_ip_{\theta}(x_i)$$

which is a general heuristic to deal with order statistics from an iid sample from a continuous density $$p_{\theta}(x)$$.

Show that $$P(X_{(k)} \le x) = P(N(x) \ge k)$$

You can proof this by showing that the events on the left side and the right side $$X_{(k)} \le x \qquad \text{and} \qquad N(x) \ge k$$ are the same event.

There is however a tricky detail which is that they are not exactly the same event.

You have on the left side an inequality that is not strict (less than or equal) whereas on the right side you have a strict inequality with the definition of $$N(x)$$ being 'the number of sample values less than $$x$$'.

### Discrete distribution

For discrete distributions this difference in the events will result in the probabilities not being equal. Take for instance a sample of size one drawn from a Bernoulli distribution. Then $$P(X_{(1)} \leq 1) = 1 \qquad \text{and} \quad P(N(1) \geq 1) = 1-p$$

### Continuous distribution

To still make the proof for continuous probabilities, we could use as starting point one of the following alternative equations instead. These are made by replacing either the $$\leq$$ sign on the left by a $$<$$ sign, or the $$<$$ sign on the right by a $$\leq$$ sign such that the events are the same.

$$\begin{array}{}P(X_{(k)} < x) &=& P(N(x) \ge k) \\ P(X_{(k)} \le x) &=& P(N^\prime(x) \ge k) \\ \end{array}\\$$

with $$N^\prime(x)$$ meaning the number of sample values less than or equal to $$x$$.

For these two expressions, you can easily see that the events are the same because they imply each other (and also the events will be the same for the discrete case). Take for example the second statement:

• If there the number of values smaller than or equal to $$x$$ is larger than $$k$$ then at least the first $$k$$ order statistics must be smaller than or equal to $$x$$ and therefore the $$k$$-th order statistic must be smaller than or equal to $$x$$.
• If the $$k$$-th order statistic is smaller than or equal to $$x$$, then so must be at least the $$k-1$$ order statistics with a lower order, and therefore there are at least $$k$$ values smaller than or equal to $$x$$.

The trick to complete the proof for the continuous distribution is that the probability of the different events (with signs $$\leq$$ or $$<$$) are the same. We have

$$P(X_{(k)} \leq x) = P(X_{(k)} < x) \\ P(N(x) \ge k) = P(N^\prime(x) \ge k)$$

The reason is because the probability of the complement of the events is zero. For instance $$\begin{array}{} P(X_{(k)} \leq x) - P(X_{(k)} < x) &=& P(\lbrace X_{(k)} \leq x \rbrace \setminus \lbrace X_{(k)} < x \rbrace)\\ &=& P(X_{(k)} = x) \\ &=& 0 \end{array}$$

### Summarizing

$$\begin{array}{ccc} \rlap{\overbrace{\phantom{P(X_{(k)} \leq x ) = P(X_{(k)} < x)}}^{\substack{\text{different events} \\ \text{but same probability} \\ \text{for continuous distributions}}}} P(X_{(k)} \leq x ) &=& \underbrace{P(X_{(k)} < x) = P(N(x) \ge k)}_{\text{same events}}\\ && \overbrace{P(X_{(k)} \le x) = P(N^\prime(x) \ge k)}_{} &=& P(N(x) \ge k \llap{\underbrace{\phantom{P(N^\prime(x) \ge k) = P(N(x) \ge k}}_{\substack{\text{different events} \\ \text{but same probability} \\ \text{for continuous distributions}}}}) \end{array}$$

So we have $$P(X_{(k)} \le x) = P(N(x) \ge k)$$ not entirely because the events are the same, but because they are events with the same probability, which is because the difference between the events has probability zero for continuous distributions.