Probability Theory – True Domain of Random Variables Explained

examples-counterexamplesmathematical modelingprobabilityprobability theoryrandom variables

Typically in applied probabilistic or statical literature we work with random variables whose domain we don't specify. We just care about the set in which the random variables takes values.

For example, the number of aces in hand at a certain cardgame, the height of a population or the income of a company in a certain year are all random variables (the last two examples come from statistics).
But in all of these examples, the domain is never given.

While we could always construct any number of artificial probability spaces, that would serve as domain, I'm interested in what a ̶"̶t̶r̶u̶e̶" compelling probability space domain could be that really models (the underlying the experiment of) these three examples?

EDIT To prevent unclarity with what I mean by "compelling". Let me be more precise by giving an example: Consider the random variable that counts the number of heads when flipping a coin $n$ times. Thus it takes values from $0,1,\ldots,n$. But which experiment would most likely be perform in order to lead to these values?
The most complelling space $\Omega$ would be $\Omega=\{ H,T\}^n$, the space of sequences of $n$ coin flipping, since this is what actually happens.
But one could just as well define this random variable on the set $\{0,1,…,n\}$, in which case the random variable would be the identity function. This space I would call artificial, not "compelling", because it doesn't give an accurate representation of the underlying experiment any more.
In particular I'm interested in the underlying space for the statistical examples.

P.S. See also this other question of mine, which also has a bounty running.

Best Answer

Here is paper that discusses your question (see page 3). And this stack overflow question: what are the sample spaces when talking about continuous random variables

As you'll see, random variables are not required to use probability theory, they are just convenient ways to capture aspects of the underlying sample space we are interested in. We could choose to work directly with the underlying sample space if we knew it (as a running example I will use $\Omega = \{H,T\}^N$ for an N-coin-toss experiment).

Basically, the decision to model outcomes of an experiment as a random variable or treat them as direct observations of the sample space is mostly a matter of perspective. The random variables view separates the object itself (possibly an abstract object) $\omega \in \Omega$ from the questions we can ask about it (e.g., "HH" vs "Number of tails","Number of Heads", "At least one tail", "No more than 2 Heads" etc).

If you only care about one question, then the views are isomorphic. However, if you want to ask multiple questions about the same observational unit, then the random variables view is more consistent with what you are trying to do. For example, you ask the height and weight of 100 randomly chosen people -- in this case, a random variables view makes more sense, as "height" and "weights" are not independent objects in the real world that "just happen" to be correlated - they are linked through people ($\omega \in \Omega$).

So, let's say I gave you the underlying sample space $\Omega$ for a problem. Now what? You will want to start to ask questions about the probability of various events defined as measurable sets with elements from $\Omega$ (e.g., all outcomes where we toss at least three heads). There are two ways to do this:

  1. Create the set of all $\omega \in \Omega$ that have three heads and then calculate the probability of this set.
  2. Define an integer-valued random variable $X(\omega)$ that returns the number of heads in $\omega$. This will create a new sample space called the image of $X(\omega)$, along with an induced probability measure $P'$ that is defined over the integers 0 to N. This induced measure is called a pushforward measure (or image measure). Now you can re-cast your question as $P'(X=3)$ as opposed to $P(\{\omega \in \Omega: \#\text{Heads}(\omega) = 3\})$ using the original space.

You are probably familiar with this stuff -- however, you want to know why we bother with it. In the case of the analysis of a single random variable, we can very well re-define our sample space by using the induced sample space (or simply define a sample space to match the properties of the random variable).

This changes when we move to jointly distributed random variables. Without $\Omega$ (at least implicitly), we'd have no way to index joint observations. Here's an example:

Lets say you sample 5 values from each of two random variables, $X$ and $Y$:

  • Observed X's = $1,1,2,5,3$
  • Observed Y's = $0,1,1,0,1$

Now, you want to develop a joint distribution that describes these observations as random variables (i.e., different aspects of some common object). How will you do this? Most importantly, you need to first associate an observation from $X$ with an observation from $Y$. Implicit in this association is the assumption that there is some common sample space $\Omega_J$ that justifies us associating, say, the first observation of $X$ with the first observation of $Y$ to form the joint observation $(1,0)$ (in this example).

So, in my example, we are assuming there is some underlying event $\omega'\in \Omega_J$ such that $X(\omega')=1$ and $Y(\omega')=0$ and that there is a valid underlying probability space $(\Omega_J,\mathcal{F}_J,P_J)$ whose image will produce the observed joint distribution of $(X,Y)$.

However, we could dispense with all of this if we chose to model $X,Y$ not as random variables but as direct observations (the integers are our experimental units or foundation data).

At this point, you may still be unconvinced of the usefulness of the sample space view...

So, let's say you develop your distribution of $X,Y$ directly (no sample space[i.e., domain-less in your terminology]), then you want to add a new quantity $Z$. How do you do this. Without an underlying sample space you need to develop the joint distribution manually from first principles (i.e., ad hoc) whereas invoking the idea of an underlying sample space makes extending joint distributions a natural consequence of defining a new function over the same (usually implicit) underlying probability space. The fact that this can be assumed to be true is a major theoretical elegance of modern probability theory.

Again, it's a matter of perspective, but the random variables view, at least to me, has a philosophical/conceptual elegance to it when you consider joint observations and stochastic processes.

Here is a nice post in math.overflow that discusses something similar.

Related Question