Probability – Exploring Mistake in Casella & Berger on Page 207

distributionsnotationprobabilityrandom variableterminology

Page 28:

A note on notation: Random variables will always be denoted with
uppercase letters and the realized values of the variable (or its
range) will be denoted by the corresponding lowercase letters. Thus,
the random variable X can take the value x.

Page 139:

For example, consider an experiment designed to gain information about
some health characteristic of a population of people… the body
weights of several people in the population might be measured. These
different weights would be observations on different random variables,
one for each person measured.

Page 207:

The random variables $X_1, … , X_n$ are called a random sample of
size n from the population f(x)
if $X_1, … , X_n$ are mutually
independent random variables and the marginal pdf or pmf of each $X_i$
is the same function f(x).

Page 207 – I think the sentence below is incorrect:

Under the random sampling model each $X_i$ is an observation on the
same variable and each $X_i$ has a marginal distribution given by
f(x).


First of all, given the authors' notation convention, we should rewrite this statement as follows, where the first $X_i$ has been replaced with $x_i$ because we are referring to a realized value of the random variable (an observation) and not the random variable itself.

Under the random sampling model each $x_i$ is an observation on the
same variable and each $X_i$ has a marginal distribution given by
f(x).

Secondly, as the authors themselves explained in the quoted text from page 139, $X_i$ is a different random variable from $X_j$ for all i$\neq$j. $X_1, … , X_n$ each have the same distribution function (are identically distributed) but these are different random variables. For example, $X_1$ is the weight of the first person; $X_2$ is the weight of the second person, etc. It seems to me that in this sentence the authors have fallen into the trap of mistaking a random variable for its distribution. Or did I interpret things incorrectly? If this is a mistake, as I think it is, it is very unfortunate as it occurs in the paragraph explaining what a random sample is and this is precisely the place where the student is trying to get clarity on observations/random variables/distributions and how they interrelate in the case of a random sample.

Best Answer

That page describes identical and independently distributed variables $X_i$.

You could change that quoted piece in more ways. Probably the following would be better

Change

Under the random sampling model each $X_i$ is an observation on the same variable and each $X_i$ has a marginal distribution given by $f(x)$.

Into

Under the random sampling model each $X_i = x_i$ is an observation on the same population and each $X_i$ has a marginal distribution given by $f(x)$.

See also the piece on page 209

The random sampling model in definitions 5.1.1 is sometimes called sampling from an infinite population. Think of obtaining the values of $X_1, \dots, X_n$ sequentially. First, the experiment is performed and $X_1 = x_1$ is observed. Then, the experiment is repeated and $X_2 = x_2$ is observed. The assumption of independence in random sampling implies that the probability distribution for $X_2$ is unaffected by the fact that $X_1 = x_1$ was observed first. „Removing” $x_1$ from the infinite population does not affect the population, so $X_2 = x_2$ is still a random observation from the same population.

In that last sentence you see both the changes come together. We speak about the small letter when referring to the observation. So it is not $X_2$ that is called an observation but the observation that $X_2 = x_2$ is the observation. And it is not called 'an observation on the same variable' but 'an observation from the same population'.

Personally I think that it is not so bad to call the $X_i$ observations as well and the more important change in the quote is changing 'variable' into 'population'.

The $X_i$ can be seen as random variables that describe a random observation. Say, you could use it in a sentence as "we describe the random observations of the Donkey's walking speed with variables $X_1, X_2,\dots,X_n$". The small letter $x_i$ is more like the realisation of the observation rather than the observation itself. The small letter $x_i$ is not strictly the 'observation' itself but it is more like the 'observed value'. You see this also in the text with sentences like

$X_1 = x_1$ is observed

The $x_1$ is not the 'observation' itself (it is not the act) but it is the 'observed value', or it is 'what is observed'.

Related Question