Probability – Understanding Random Variables

probabilityrandom variablesstatistics

I was wondering if I correctly understand what a random variable is. Is a random variable's domain the set of numbers that are reasonable, when considering how the random variable is defined. For instance, $X =$ number of broken eggs in a dozen-egg cartoon. Would the domain be $\{0,1,2,3,4,….12\}$? And when you apply a function to the random variable, it will associate a number from the domain, to an element in the sample same, thereby giving some output? Taking our example, $(X=12)$ would take the domain value 12, and associate to it the element, from the sample space, $(B,~B,~B,~B,~B,~B,~B,~B,~B,~B,~B,~B)$? If I am using any notation incorrectly, please be prompt to remonstrate.

Best Answer

Technically you're using the term "domain" wrong -- what you're thinking of here is the range (or "codomain") of the random variable.

Formally a random variable is a (measurable) function from a sample space $\Omega$ to some other set that is the range of the random variable. Exactly what $\Omega$ is is usually left implicit in computation, and it is almost universal to write just $X$ in computations rather than "$X(\omega)$ where $\omega\in\Omega$ is a point in the sample space that represents one outcome of the experiment". This is because when there are several random variables involved, we always talk about their values at the same $\omega\in\Omega$, so possibly $X(\omega)+Y(\omega)$, but never $X(\omega_1)+Y(\omega_2)$.

In particular is $X$ and $Y$ are number-valued random variables, the notation $X+Y$ stands for the function $\omega \mapsto X(\omega)+Y(\omega)$, which is itself a random variable.

Similarly, when $X$ is a random variable whose range is in the set A (that is, $X:\Omega\to A$) and $f$ is a function $A\to B$, we write "$ f(X) $" to denote the random variable that is the composition of $f$ with $X$, that is, the function $\omega \mapsto f(X(\omega)) $.

In your example, $X$ is a random value with range $\{0,1,2,\ldots,12\}$. "Broken" and "unbroken" are not elements of either the sample space or the range of $X$, but there could be another random variable $Y$ with range $\{\mathtt b,\mathtt u\}$ that encodes whether the top right egg in the container is broken or not. In that case the two variables would be related by $Y=\mathtt b \Rightarrow X\ne 0$ and $Y=\mathtt u\Rightarrow X\ne 12$.

We don't usually describe explicitly what the sample space is. Here it presumably contains at least one point for each of the $2^{12}$ possible combinations of broken and unbroken eggs, but there could be many different elements in the sample space for each such combination. Being deliberately vague about the sample space has the advantage that we can almost always decide halfway through the analysis to assume that there are dimensions to it that we haven't mentioned before, so that for example it has room to define a random variable $Z$ giving the number of red lights the driver of the egg transport needs to stop at on the way to the supermarket.

Related Solutions

[Math] Discerning The Set Of Values For A Random Variable

Let's take d) as an example. The length $X>0$ for sure, but that's all you know. You could imagine a snake of arbitrarily large length, although the longer it is, the more remote a chance you'd ever find one. That of course would be reflected in the probability distribution. So I would say that $0 < X < \infty$. Reason the others the same way as well (although I seem to remember a pH having a max value of $14$).

Probability Theory – Precise Definition of the Support of a Random Variable

I am not entirely convinced with the line the sample space is also called the support of a random variable

That looks quite wrong to me.

What is even more confusing is, when we talk about support, do we mean that of $X$ or that of the distribution function $Pr$?

In rather informal terms, the "support" of a random variable $X$ is defined as the support (in the function sense) of the density function $f_X(x)$.

I say, in rather informal terms, because the density function is a quite intuitive and practical concept for dealing with probabilities, but no so much when speaking of probability in general and formal terms. For one thing, it's not a proper function for "discrete distributions" (again, a practical but loose concept).

In more formal/strict terms, the comment of Stefan fits the bill.

Do we interpret the support to be

- the set of outcomes in Ω which have a non-zero probability,
- the set of values that X can take with non-zero probability?

Neither, actually. Consider a random variable that has a uniform density in $[0,1]$, with $\Omega = \mathbb{R}$. Then the support is the full interval $[0,1]$ - which is a subset of $\Omega$. But, then, of course, say $x=1/2$ belongs to the support. But the probability that $X$ takes this value is zero.

Best Answer

Related Solutions

[Math] Discerning The Set Of Values For A Random Variable

Probability Theory – Precise Definition of the Support of a Random Variable

Related Question