[Math] What does it mean to say that one random variable is “greater” than another

order-statisticsrandom variables

This might be a very basic question, but I've just started with order statistics so it is tripping me up a bit.

What does it mean to say that a random variable, say $Q_1$, is greater than another, say $Q_2$?

The way I've heard order statistics explained is that we have some observations, e.g., $x_1 = 1$, $x_2 = 3$, $x_3 = 2$. We can then say that $x_1 < x_3 < x_2$, i.e. $x_1 = x_{(1)}, x_2 = x_{(3)}, x_3 = x_{(2)}$. So far, this makes sense to me, because we are ordering numbers, not random variables.

Now however, people start to say that (and we denote random variables with upper case) $X_{(1)} < X_{(2)} < X_{(3)}$ and that we can find the probability distributions of each of these: which I do not understand since, if we already have the samples and values, why is there a probability distribution since we know for certain what they are? If random variables are meant to represent an unknown quantity, how can they be ordered – and if they represent a known quantity, why is there any uncertainty/probability involved?

Best Answer

"If random variables are meant to represent an unknown quantity, how can they be ordered".

One straightforward way to order random variables is to treat them as functions on your probability space $(\Omega,P,\Sigma)$: $ Q_1:\Omega\rightarrow\mathbb{R}$ and $Q_2:\Omega\rightarrow\mathbb{R}.$ Then you can define $$ Q_1 > Q_2 \quad \text{ iff } \quad \forall \omega\in\Omega: Q_1(\omega)>Q_2(\omega).$$ Notice that this definition makes perfect sense even if you do not know anything about specific values of $Q_1,Q_2.$ If you do have a probability defined, it makes sense to weaken this requirement and exclude sets of zero probability. This is what Lord Shark the Unknown suggested above.

This way of ordering works also for order statistics. In this case your random variable $X$ is not real valued but $X:\Omega\rightarrow\mathbb{R}^n$ an $n$-vector, where $n$ is the size of your sample. In the case $n=2$ for example, the brief statement $X_{(1)}\leq X_{(2)}$ now means in "natural language":

When I generate pairs of random numbers, I do not know what pairs come up, but I know that after sorting the pair in ascending order, the second value will be at least as large as the first value.

I hope you agree that this statement is simple and true and it should not confuse you.

This is just the most basic notion of order for random variables. Many more and quite different definitions of stochastic order are possible. Have a look at the Wikipedia article.