Reconciling measure-theoretic definition of expectation versus expectation defined in elementary probability

definitionintegrationlebesgue-integralprobability theoryrandom variables

Measure theoretic definition (e.g., Durrett, Brzezniak):

Given a probability space $(\Omega, \mathcal{F}, P)$ and a random
variable $X: \Omega \to \mathbb{R}$ assumed to be integrable,
$$\int_\Omega |X| dP < \infty$$ then the expectation of $X$ is
defined by, $$E[X] = \int_\Omega X dP$$

Elementary probability definition (e.g., Sheldon Ross, Tsitsiklis),

The expectation of a random variable $X$ is

$$E[X] = \begin{cases} \sum\limits_{x:p(x)>0} x p(x) & X \text{ is
discrete} \\ \int\limits_{-\infty}^\infty xf(x) dx & X \text{ is
continuous} \end{cases}$$

where $p$ is the probability mass function and $f$ is the probability density function.

It seems to that

  1. the $dP$ symbol somehow magically turns into $p(x)$ and $f(x)dx$. I can't see the link here since I don't understand the symbol $dP$. Is it a function? Is it a constant? What is the $d$ doing there (that's not a part of the definition at all)?

  2. the $\int_\Omega$ symbol turns into $\sum$ or a $\int$ over the space of $x$ on a situational basis, which is again strange to me.

Can someone explain how the measure theoretic definition of the expectation turns into the ones that most (non-mathematician) students study in a course on probability?

Best Answer

If $X : \Omega \to \mathbb{R}$ is a random variable, it induces the probability measure on $\mathbb{R}$, often denoted as $P(X \in \cdot)$, defined as the map

$$ X \mapsto P(X \in A) $$

for $A \in \mathcal{B}(\mathbb{R})$. This is called the pushforward measure of $P$ by $X$. In this particular case, it is the same as the Stieltjes measure induced by $F_X(\cdot) = \mathbb{P}(X \leq \cdot)$, and so, we may interchangeably write

$$ \int_{\mathbb{R}} f(x) \, \mathrm{d}F_X(x) = \int_{\mathbb{R}} f(x) \, P(X \in \mathrm{d}x). $$

Under this setting, we have the following theorem. (You may also refer to Theorem 1.6.9 of Durrett 4.1Ed.)

Theorem (Change of variables) For any random variable $X : \Omega \to \mathbb{R}$ and for any Borel-measurable $f : \mathbb{R} \to [0, \infty]$, the following identity holds:

$$ \int_{\Omega} f(X(\omega)) \, P(\mathrm{d}\omega) = \int_{\mathbb{R}} f(x) \, P(X \in \mathrm{d}x) $$

Of course, the same conclusion continues to hold when $f$ is any $\mathbb{R}$-valued Borel-measurable function such that $|f(X)|$ is integrable. This easily follows by decomposing $f$ as $f_+ - f_-$, where $f_+$ (resp. $f_-$) is the positive part (resp. negative part) of $f$ and applying the above theorem to $f_{\pm}$ separately. In particular, we get

$$ E[X] = \int_{\mathbb{R}} x \, P(X \in \mathrm{d}x). $$

Here are some special cases:

  • $X$ has discrete distribution if and only if $P(X \in \cdot) = \sum_{i=1}^{\infty} p_X(x_i) \delta_{x_i}(\cdot)$, where $p_X$ is the PMF of $X$ and $\delta_x$ is the point mass at $x$. In such case, for $f \geq 0$,

    $$ E[X] = \int_{\mathbb{R}} x \, \sum_{i=1}^{\infty} p_X(x_i) \delta_{x_i}(\mathrm{d}x) = \sum_{i=1}^{\infty} \left( \int_{\mathbb{R}} x \delta_{x_i}(\mathrm{d}x) \right) p_X(x_i) = \sum_{i=1}^{\infty} x_i p_X(x_i) $$

  • $X$ has continuous distribution if and only if $P(X \in \mathrm{d}x) = f_X(x) \, \mathrm{d}x $, where $f_X$ is the PDF of $X$. In such case, for $f \geq 0$,

    $$ E[X] = \int_{\mathbb{R}} x f_X(x) \, \mathrm{d}x $$