If we define the population as the complete set of items or "events" of interest.
And
We define the sample space as the set of all possible outcomes (exhaustively) from a random experiment.
Then, I wondered this...
Take a dice roll. The population is the complete set of possible items {1, 2, 3, 4, 5, 6}
. The sample space is the set of all possible outcomes, also {1, 2, 3, 4, 5, 6}
. So here sample space and population appear to be the same thing, so when are they not and what are the distinguishing factors between the two??
The WikiPedia page on sample spaces caused the penny to drop for me:
...For many experiments, there may be more than one plausible sample
space available, depending on what result is of interest to the
experimenter. For example, when drawing a card from a standard deck of
fifty-two playing cards, one possibility for the sample space could be
the various ranks (Ace through King), while another could be the suits
(clubs, diamonds, hearts, or spades)...
Ah ha! So my population is the set of all cards {1_heart, 2_heart, ..., ace_heart, 1_club, ...}
but the sample space may be, if we are looking for the suits, just {heart, club, diamond, spade}
. So the population and sample space are different here.
In summary the population is the set of items I'm looking at. The sample space may or may not be the population... that depends on what question about the population is being asked.
This answers the latter half of my question... possibly I didn't ask it clearly enough or it was just too obvious (it just took a while for it to sink into my head)
The other half of the question is answered by "Qwerty". In all the sources I've looked at it appears classical probability treats events as equally likely. [1] [2] (and the book I referenced in the Q). "Qwerty" has expanded it slightly... but I believe this is where relative frequency probability comes into play and allows us to model "unfair" (not equally likely) events: From [2]
The probability of an event is the ratio of the number of cases
favorable to it, to the number of all cases possible when nothing
leads us to expect that any one of these cases
should occur more than any other, which renders them, for us, equally
possible.
And
The classical definition of probability was called into question by
several writers of the nineteenth century, including John Venn and
George Boole.2 The frequentist definition of probability became
widely accepted as a result of their criticism
Here is paper that discusses your question (see page 3). And this stack overflow question:
what are the sample spaces when talking about continuous random variables
As you'll see, random variables are not required to use probability theory, they are just convenient ways to capture aspects of the underlying sample space we are interested in. We could choose to work directly with the underlying sample space if we knew it (as a running example I will use $\Omega = \{H,T\}^N$ for an N-coin-toss experiment).
Basically, the decision to model outcomes of an experiment as a random variable or treat them as direct observations of the sample space is mostly a matter of perspective. The random variables view separates the object itself (possibly an abstract object) $\omega \in \Omega$ from the questions we can ask about it (e.g., "HH" vs "Number of tails","Number of Heads", "At least one tail", "No more than 2 Heads" etc).
If you only care about one question, then the views are isomorphic. However, if you want to ask multiple questions about the same observational unit, then the random variables view is more consistent with what you are trying to do. For example, you ask the height and weight of 100 randomly chosen people -- in this case, a random variables view makes more sense, as "height" and "weights" are not independent objects in the real world that "just happen" to be correlated - they are linked through people ($\omega \in \Omega$).
So, let's say I gave you the underlying sample space $\Omega$ for a problem. Now what? You will want to start to ask questions about the probability of various events defined as measurable sets with elements from $\Omega$ (e.g., all outcomes where we toss at least three heads). There are two ways to do this:
- Create the set of all $\omega \in \Omega$ that have three heads and then calculate the probability of this set.
- Define an integer-valued random variable $X(\omega)$ that returns the number of heads in $\omega$. This will create a new sample space called the image of $X(\omega)$, along with an induced probability measure $P'$ that is defined over the integers 0 to N. This induced measure is called a pushforward measure (or image measure). Now you can re-cast your question as $P'(X=3)$ as opposed to $P(\{\omega \in \Omega: \#\text{Heads}(\omega) = 3\})$ using the original space.
You are probably familiar with this stuff -- however, you want to know why we bother with it. In the case of the analysis of a single random variable, we can very well re-define our sample space by using the induced sample space (or simply define a sample space to match the properties of the random variable).
This changes when we move to jointly distributed random variables. Without $\Omega$ (at least implicitly), we'd have no way to index joint observations. Here's an example:
Lets say you sample 5 values from each of two random variables, $X$ and $Y$:
- Observed X's = $1,1,2,5,3$
- Observed Y's = $0,1,1,0,1$
Now, you want to develop a joint distribution that describes these observations as random variables (i.e., different aspects of some common object). How will you do this? Most importantly, you need to first associate an observation from $X$ with an observation from $Y$. Implicit in this association is the assumption that there is some common sample space $\Omega_J$ that justifies us associating, say, the first observation of $X$ with the first observation of $Y$ to form the joint observation $(1,0)$ (in this example).
So, in my example, we are assuming there is some underlying event $\omega'\in \Omega_J$ such that $X(\omega')=1$ and $Y(\omega')=0$ and that there is a valid underlying probability space $(\Omega_J,\mathcal{F}_J,P_J)$ whose image will produce the observed joint distribution of $(X,Y)$.
However, we could dispense with all of this if we chose to model $X,Y$ not as random variables but as direct observations (the integers are our experimental units or foundation data).
At this point, you may still be unconvinced of the usefulness of the sample space view...
So, let's say you develop your distribution of $X,Y$ directly (no sample space[i.e., domain-less in your terminology]), then you want to add a new quantity $Z$. How do you do this. Without an underlying sample space you need to develop the joint distribution manually from first principles (i.e., ad hoc) whereas invoking the idea of an underlying sample space makes extending joint distributions a natural consequence of defining a new function over the same (usually implicit) underlying probability space. The fact that this can be assumed to be true is a major theoretical elegance of modern probability theory.
Again, it's a matter of perspective, but the random variables view, at least to me, has a philosophical/conceptual elegance to it when you consider joint observations and stochastic processes.
Here is a nice post in math.overflow that discusses something similar.
Best Answer
I think you are misreading the formal definition of a random variable. The domain is the measure space of possible outcomes. The codomain is the real numbers: the possible values of a measurement that depends on the outcome. So for pair of dice the set of outcomes is the obvious $36$ element set. If the dice are fair then the probability measure on that finite set is the uniform one. The sum of a roll is an example of a random variable.