[Math] Are the sample space for one and multiple coin tosses the same

statistics

I am reading 'All of Statistics by Larry Wasserman'. I am at the first chapter and reading about sample space, sample outcome and events. I am a bit confused with one of the examples the author provided when explaining sample space.

Definition of sample space from the book: The sample space $\Omega$ is the set of possible outcomes of an experiment.

Here is what I understood: If our experiment is a coin toss, then the outcome of the experiment is either head or tail. So our sample space is $\Omega = \{H, T\}$ Also since sample space is a set it can not contain duplicate values.

Then the book goes on and gives us an example: If we toss a coin forever, then the sample space is the infinite set,

$\Omega = \{\omega = (\omega_1, \omega_2, \omega_3, . . . , ) : \omega_i ∈ {H, T}\}$

But this contradict with my understanding. Because in the example, both $\omega_1$ and $\omega_2$ can be H, and duplicate values are not allowed in a set.

My understanding is that, regardless of how many times we toss the coin(once or forever), the sample space will always contain two values($\Omega = \{H, T\}$) and not an infinite number of values, as the book says – because of the definition of set.

Can someone help me clear our my confusion.

Best Answer

Our sample space $\Omega$ is the set of all possible infinite sequences (tuples) of Heads (hereafter shorted as $H$) and Tails (hereafter shortened as $T$). A sequence can be thought of like a set with the following major differences: elements can be repeated and order of elements matters.

Consider a smaller example. Consider the following sample space of all possible results of three coin flips in sequence:

$$X = \{(\omega_1,\omega_2,\omega_3)~:~\omega_i\in \{H,T\}\}$$

The notation above is interpreted as meaning in words as "$X$ is the set of all possible triples of the form $(\omega_1,\omega_2,\omega_3)$ where each element of a triple is either an $H$ or a $T$. This could have been written explicitly as the following:

$$X = \left\{(H,H,H),(H,H,T),(H,T,H),(H,T,T),(T,H,H),(T,H,T),(T,T,H),(T,T,T)\right\}$$

"But this contradict with my understanding. Because in the example, both ω1 and ω2 can be H, and duplicate values are not allowed in a set."

Note that for example $(H,H,T)$ is a valid triple despite the fact that there are multiple $H$'s appearing and further it is considered a different triple than $(H,T,H)$ despite the fact that both have two $H$'s and one $T$ since the order in which they occur is different. Since the elements of our sample space are themselves sequences or tuples, we do not care if there are repeats within the sequences or tuples, just so long as the sequences or tuples themselves aren't repeated (and even that wouldn't be such a bad thing., we would just consider them as occurring only once).

Note, the difference in how these are enclosed. Sets are enclosed with curly brackets like so: $\{~~~\}$ while tuples and sequences are enclosed with circular brackets like so: $(~~~)$


Going back to your original example,

$$\Omega = \{(\omega_1,\omega_2,\omega_3,\dots)~:~\omega_i\in\{H,T\}\}$$

This, similarly to before is the set of all possible sequences where each element in the sequences come from the set $\{H,T\}$.

One such sequence might begin $(H,T,H,T,H,T,H,T,\dots)$ while another might begin $(H,H,H,H,H,H,\dots)$, etc...

"In that case the example is the book is showing us one sample outcome. The sample space containing all sample outcome needs to be written as $\Omega = \{ \omega_1, \omega_2,..., : \omega_i = (\omega_{i1}, \omega_{i2}, \omega_{i3},...,) : \omega_{ij} \epsilon \{H, T\}\}$ Am I thinking right?"

You are getting closer however there is a property about $\Omega$ which your notation above gets wrong. When we begin to write elements and taper off with ellipses (the three periods in a row ...) this implies that the list of elements is not only infinite but countably infinite. Your attempt at notating this would have people look at a glance and think there are only countably many such possible infinite sequences of heads and tails. This is incorrect.

There are in fact uncountably infinitely many infinite sequences of heads and tails. As such, we cannot even begin to list them in a pattern which would eventually list them all. A variation on Cantor's Diagonal Argument will prove that. Alternatively, consider replacing $H$'s by $1$'s and $T$'s by $0$'s and interpret each sequence as a sequence of binary numbers occurring after the decimal. You will have described every possible real number between $0$ and $1$ (some of which twice), again showing $\Omega$ has cardinality at least as great as the continuum.

As such, when referring to the tuples in our set, we should avoid using the ellipses and instead just use set builder notation as we had before. In the original notation, we had not decided to give an arbitrary sequence a label, opting to just refer to it as $(\omega_1,\omega_2,\dots)$, but if you insist on giving these labels then we could do it as the following:

$$\Omega = \{\omega~:~\omega = (\omega_1,\omega_2,\omega_3,\dots),~\omega_i\in\{H,T\}\}$$

In words that is $\Omega$ is the set of all elements $\omega$, where $\omega$ is itself a countably infinite sequence of elements each of whom are either $H$ or $T$.

Compare this to the way the original was phrased: $\Omega$ is the set of all countably infinite sequences of elements each of whom are either $H$ or $T$.

The descriptions are hardly different and the rewriting you proposed was not necessary.

Related Question