Probability – Understanding Random Variables

probabilityrandom variablesstatistics

I was wondering if I correctly understand what a random variable is. Is a random variable's domain the set of numbers that are reasonable, when considering how the random variable is defined. For instance, $X =$ number of broken eggs in a dozen-egg cartoon. Would the domain be $\{0,1,2,3,4,….12\}$? And when you apply a function to the random variable, it will associate a number from the domain, to an element in the sample same, thereby giving some output? Taking our example, $(X=12)$ would take the domain value 12, and associate to it the element, from the sample space, $(B,~B,~B,~B,~B,~B,~B,~B,~B,~B,~B,~B)$? If I am using any notation incorrectly, please be prompt to remonstrate.

Best Answer

Technically you're using the term "domain" wrong -- what you're thinking of here is the range (or "codomain") of the random variable.

Formally a random variable is a (measurable) function from a sample space $\Omega$ to some other set that is the range of the random variable. Exactly what $\Omega$ is is usually left implicit in computation, and it is almost universal to write just $X$ in computations rather than "$X(\omega)$ where $\omega\in\Omega$ is a point in the sample space that represents one outcome of the experiment". This is because when there are several random variables involved, we always talk about their values at the same $\omega\in\Omega$, so possibly $X(\omega)+Y(\omega)$, but never $X(\omega_1)+Y(\omega_2)$.

In particular is $X$ and $Y$ are number-valued random variables, the notation $X+Y$ stands for the function $\omega \mapsto X(\omega)+Y(\omega)$, which is itself a random variable.

Similarly, when $X$ is a random variable whose range is in the set A (that is, $X:\Omega\to A$) and $f$ is a function $A\to B$, we write "$ f(X) $" to denote the random variable that is the composition of $f$ with $X$, that is, the function $\omega \mapsto f(X(\omega)) $.

In your example, $X$ is a random value with range $\{0,1,2,\ldots,12\}$. "Broken" and "unbroken" are not elements of either the sample space or the range of $X$, but there could be another random variable $Y$ with range $\{\mathtt b,\mathtt u\}$ that encodes whether the top right egg in the container is broken or not. In that case the two variables would be related by $Y=\mathtt b \Rightarrow X\ne 0$ and $Y=\mathtt u\Rightarrow X\ne 12$.

We don't usually describe explicitly what the sample space is. Here it presumably contains at least one point for each of the $2^{12}$ possible combinations of broken and unbroken eggs, but there could be many different elements in the sample space for each such combination. Being deliberately vague about the sample space has the advantage that we can almost always decide halfway through the analysis to assume that there are dimensions to it that we haven't mentioned before, so that for example it has room to define a random variable $Z$ giving the number of red lights the driver of the egg transport needs to stop at on the way to the supermarket.