Is the notion of random variable always necessary

measure-theoryprobabilityprobability theory

I'm quite confused by the notion of random variable in the proper measure-theoretic framework. Let's first state the notation and definitions:

Let $(\Omega, \Sigma, \operatorname{P})$ be a probability space. Then, a real-valued random variable is a measurable function $X \colon \Omega \to \mathbb{R}$ and its probability distribution is the pushforward measure $\operatorname{P}_{X} := \operatorname{P} \circ X^{-1}$. If $\operatorname{P}_{X}$ is absolutely continuous with respect to the Lebesgue measure $\lambda$ we also know that there is a probability density function $f\colon \mathbb{R} \to \mathbb{R}$ such that $\operatorname{P}_{X}(B) = \int_B f \, \mathrm{d} \lambda$ for $B \in \mathcal{B}(\mathbb{R})$ (by the Radon–Nikodym theorem).

Now let's see a simple example that is often used to illustrate the notion of random variable:

  1. Random variable that represents the sum of two dice. In this case $\Omega = \{1, 2, 3, 4, 5, 6\}^2$, $\Sigma = \mathcal{P}(\Omega)$, and $\operatorname{P}(A) = \frac{\#A}{36}$ for $A \in \Sigma$, $X \colon (\omega_1, \omega_2) \mapsto \omega_1 + \omega_2$ and e.g. $\operatorname{P}_X(3) = \operatorname{P}(\{(1, 2), (2, 1)\}) = \frac{1}{18}$.

This is all crystal clear but the two examples below break my little mind:

  1. Normal random variable. What is $(\Omega, \Sigma, \operatorname{P})$ now? Others have given the answer that the underlying probability space is just abstract and unspecified. But why then, is it necessary to use the notion of random variable in the first place here? Wouldn't it be easier just to say that we are working with a probability space with $\Omega = \mathbb{R}$, $\Sigma = \mathcal{B}(\mathbb{R})$, and $\operatorname{P}(A) = \int_A \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2}\left(\frac{x – \mu}{\sigma}\right)^2}\, \mathrm{d}\lambda(x)$ for $A \in \Sigma$?
  2. Random variable that represents the outcome of the toss of a fair coin. As explained here, the underlying probability space is again some abstract space of all conceivable futures. But why do we even need that? Why not directly use $\Omega = \{0, 1\}$, $\Sigma = \mathcal{P}(\Omega)$, and $\operatorname{P}(A) = \frac{\#A}{2}$ for $A \in \Sigma$?

If it is indeed beneficial to introduce random variables in these two cases, what are the benefits?

Best Answer

[updated with example below]

$\newcommand{\Cov}{\mathrm{Cov}}$

The real need for a notion of a random variable, as opposed to a distribution, comes because one wants have a single mathematical object that contains all of the information necessary to make statements or formulate questions about a given random quantity.

Suppose I want to ask whether real valued random variables $X$ and $Y$ are independent? Without random variables, I cannot answer this just using the distribution of $X$ and the distribution of $Y$. I instead need to appeal to another mathematical object, the joint distribution for $X$ and $Y$ on $\mathbb R\times \mathbb R$.

So (considering real quantities for the moment) every statement about a family of random quantities, say indexed by a set $S$, would need to first specify a joint distribution on the product space $\mathbb R^S$. Further statements involving other random quantities, say indexed by a set $T$ which might or might not intersect with $S$, would need to re-specify a new joint distribution, this time on $\mathbb R^T$, in such a way that was compatible with the distribution on $S$.

It becomes much simpler just to assume, once and for all, an underlying sample space, and then a random quantity has a precise formulation as a random variable, i.e., a measurable function on that sample space.

Example

Suppose we are flipping a fair coin twice, and recording the number of heads flipped as a Bernoulli variable for each as $X_1$, $X_2$. Suppose $X_3=1-X_1$ is defined as the number of tails flipped on the first trial, and likewise $X_4=1-X_2$. I can define these all in the obvious way as random variables on the sample space of outcomes $\Omega =\{HH,HT,TH,TT\}$, with probability measure $\mu(S)=\frac{\#S}{4}$.

Treating these as random variables on a sample space, I can define independence of $X_i$ and $X_j$ in terms of independence of the events $\{\omega\mid X_i(\omega)\leq x_i\}$ and $\{\omega\mid X_j(\omega)\leq x_j\}$ for all $x_i,x_j\in \mathbb R$, and from this definition, $X_1$ and $X_2$ are independent, while $X_1$ and $X_3$ are not, for example.

However, the distributions of all four $X_i$’s are identical, so there is no way to define independence in terms of their individual distributions. We would need to separately know the joint distribution for every pair in order to answer that question. Or we would need a single joint distribution on $\mathbb R^4$ from which we could derive the pair-wise distributions.

Note that the latter joint distribution on $\mathbb R^4$ would effectively function as an alternative sample space, with the projections onto each coordinate functioning as the given random variables. But it would be quite a bit more cumbersome to describe the joint distribution on four random variables, not all of which are independent. Moreover, suppose we wished to consider other random variables like $Y=\frac{X_1-X_2-X_3}{3}$? How would we easily define something like $\Cov(X_1,Y)$? Do we really want to now derive another joint distribution just for this?