[Math] Why do we need random variables

big-listpr.probabilitysoft-questionteaching

In this MathStackExchange post the question in the title was asked without much outcome, I feel.
Edit: As Douglas Zare kindly observes, there is one more answer in MathStackExchange now.

I am not used to basic Probability, and I am trying to prepare a class that I need to teach this year. I feel I am unable to motivate the introduction of random variables. After spending some time speaking about Kolmogoroff's axioms I can explain that they allow to make the following sentence true and meaningful:

The probability that, tossing a coin $N$ times, I get $n\leq N$ tails equals
$$\tag{$\ast$}{N \choose n}\cdot\Big(\frac{1}{2}\Big)^N.$$

But now people (i.e. books I can find) introduce the "random variable $X\colon \Omega\to\mathbb{R}$ which takes values $X(\text{tails})=1$ and $X(\text{heads})=0$" and say that it follows the binomial rule. To do this, they need a probability space $\Omega$: but once one has it, one can prove statement $(\ast)$ above. So, what is the usefulness of this $X$ (and of random variables, in general)?

Added: So far my question was admittedly too vague and I try to emend.

Given a discrete random variable $X\colon\Omega\to\mathbb{R}$ taking values $\{x_1,\dots,x_n\}$ I can define $A_k=X^{-1}(\{x_k\})$ for all $1\leq k\leq n$. The study of the random variable becomes then the study of the values $p(A_k)$, $p$ being the probability on $\Omega$. Therefore, it seems to me that we have not gone one step further in the understanding of $\Omega$ (or of the problem modelled by $\Omega$) thanks to the introduction of $X$.

Often I read that there is the possibility of having a family $X_1,\dots,X_n$ of random variables on the same space $\Omega$ and some results (like the CLT) say something about them. But then

  1. I know no example—and would be happy to discover—of a problem truly modelled by this, whereas in most examples that I read there is either a single random variable; or the understanding of $n$ of them requires the understanding of the power $\Omega^n$ of some previously-introduced measure space $\Omega$.
  2. It seems to me (but admit to have no rigourous proof) that given the above $n$ random variables on $\Omega$ there should exist a $\Omega'$, probably much bigger, with a single $X\colon\Omega'\to\mathbb{R}$ "encoding" the same information as $\{X_1,\dots,X_n\}$. In this case, we are back to using "only" indicator functions. I understand that this process breaks down if we want to make $n\to \infty$, but I also suspect that there might be a deeper reason for studying random variables.

All in all, my doubts come from the fact that random variables still look to me as being a poorer object than a measure (or, probably, of a $\sigma$-algebra $\mathcal{F}$ and a measure whose generated $\sigma$-algebra is finer than $\mathcal{F}$, or something like this); though, they are introduced, studied, and look central in the theory. I wonder where I am wrong.

Caveat: For some reason, many people in comments below objected that "throwing random variables away is ridiculous" or that I "should try to come out with something more clever, then, if I think they are not good". That was not my point. I am sure they must be useful, lest all textbooks would not introduce them. But I was unable to understand why: many useful and kind answers below helped much.

Best Answer

One of your concerns is (let me quote from your question)

Often I read that there is the possibility of having a family X1,…,Xn of random variables on the same space. I know no example—and would be happy to discover—of a problem truly modelled by this, whereas in most examples that I read there is either a single random variable

Here is what I do on the first day of my probability class.

The statistical experiment I describe is: Go to the road outside the college building and consider the first car that goes left to right after your arrival. As we do not know/cannot predict which car in the city might be there it is a statistical experiment. The sample space is the set of all cars in your city (or in your country).

Questions:

  1. How many people are in that car?

  2. What is the amount of petrol in the fuel tank at that time?

  3. How many kilometers the car has travelled that day before you noticed?

  4. What is the wavelength of the color of the car? (admittedly artificial)

All these are random variables on the same sample space.

Answer to question 1 might be useful to a person who sells eatables on the roadside? (more passengers means more business)

Answer to question 2 might help decide if it would be profitable to open a petrol-selling shop here.

I ask students to come up with examples of such statistical experiments instead of coin-tossing and dice-throwing ones.

I got this from a bright student:

Go to the library. Observe the first book that is borrowed by a user that day. Sample space is all books of the library.

Random variables are: Number of pages of that book, Price of that book, How many times it has been borrowed earlier.

Related Question