Measure theoretic definition (e.g., Durrett, Brzezniak):
Given a probability space $(\Omega, \mathcal{F}, P)$ and a random
variable $X: \Omega \to \mathbb{R}$ assumed to be integrable,
$$\int_\Omega |X| dP < \infty$$ then the expectation of $X$ is
defined by, $$E[X] = \int_\Omega X dP$$
Elementary probability definition (e.g., Sheldon Ross, Tsitsiklis),
The expectation of a random variable $X$ is
$$E[X] = \begin{cases} \sum\limits_{x:p(x)>0} x p(x) & X \text{ is
discrete} \\ \int\limits_{-\infty}^\infty xf(x) dx & X \text{ is
continuous} \end{cases}$$
where $p$ is the probability mass function and $f$ is the probability density function.
It seems to that
-
the $dP$ symbol somehow magically turns into $p(x)$ and $f(x)dx$. I can't see the link here since I don't understand the symbol $dP$. Is it a function? Is it a constant? What is the $d$ doing there (that's not a part of the definition at all)?
-
the $\int_\Omega$ symbol turns into $\sum$ or a $\int$ over the space of $x$ on a situational basis, which is again strange to me.
Can someone explain how the measure theoretic definition of the expectation turns into the ones that most (non-mathematician) students study in a course on probability?
Best Answer
If $X : \Omega \to \mathbb{R}$ is a random variable, it induces the probability measure on $\mathbb{R}$, often denoted as $P(X \in \cdot)$, defined as the map
$$ X \mapsto P(X \in A) $$
for $A \in \mathcal{B}(\mathbb{R})$. This is called the pushforward measure of $P$ by $X$. In this particular case, it is the same as the Stieltjes measure induced by $F_X(\cdot) = \mathbb{P}(X \leq \cdot)$, and so, we may interchangeably write
$$ \int_{\mathbb{R}} f(x) \, \mathrm{d}F_X(x) = \int_{\mathbb{R}} f(x) \, P(X \in \mathrm{d}x). $$
Under this setting, we have the following theorem. (You may also refer to Theorem 1.6.9 of Durrett 4.1Ed.)
Of course, the same conclusion continues to hold when $f$ is any $\mathbb{R}$-valued Borel-measurable function such that $|f(X)|$ is integrable. This easily follows by decomposing $f$ as $f_+ - f_-$, where $f_+$ (resp. $f_-$) is the positive part (resp. negative part) of $f$ and applying the above theorem to $f_{\pm}$ separately. In particular, we get
$$ E[X] = \int_{\mathbb{R}} x \, P(X \in \mathrm{d}x). $$
Here are some special cases:
$X$ has discrete distribution if and only if $P(X \in \cdot) = \sum_{i=1}^{\infty} p_X(x_i) \delta_{x_i}(\cdot)$, where $p_X$ is the PMF of $X$ and $\delta_x$ is the point mass at $x$. In such case, for $f \geq 0$,
$$ E[X] = \int_{\mathbb{R}} x \, \sum_{i=1}^{\infty} p_X(x_i) \delta_{x_i}(\mathrm{d}x) = \sum_{i=1}^{\infty} \left( \int_{\mathbb{R}} x \delta_{x_i}(\mathrm{d}x) \right) p_X(x_i) = \sum_{i=1}^{\infty} x_i p_X(x_i) $$
$X$ has continuous distribution if and only if $P(X \in \mathrm{d}x) = f_X(x) \, \mathrm{d}x $, where $f_X$ is the PDF of $X$. In such case, for $f \geq 0$,
$$ E[X] = \int_{\mathbb{R}} x f_X(x) \, \mathrm{d}x $$