Expected Value – General Solution of Expected Value of E(f(X))

expected valuemeasure-theoryprobability-inequalitiesself-study

This is maybe a trivial question I came up while solving a few examples and understanding Markov/Chebyshev inequalities and subsequently in evaluating Chernoff bounds. Suppose $X$ is a random variable with some distribution $p$ and we need to evaluate $\mathbb{E}[f(X)]$. Here, $f(X)$ may or may not have the same distribution as $X$. In my case, if $f(X) = e^{sX}$, where $s>0$. I would work out this problem by letting $Z = e^{sX}$ and

$\mathbb{E}[f(X)] = \mathbb{E}[Z] = \int_{\infty}^{\infty}z \hspace{0.2em}p(z)dz$.

Now I find the distribution of $Z$ as:
$P\{ Z \leq z \} \implies P\{ e^{sX} \leq z \} \implies P\{ X \leq \frac{\log(z)}{s}\} = \int_{-\infty}^{\frac{\log(z)}{s}}x.p(x)dx
$

thereby substituting this into the expectation one gets
$\mathbb{E}[Z] = \int_{\infty}^{\infty}z\hspace{0.2em} \left(\int_{-\infty}^{\frac{\log(z)}{s}}x\hspace{0.2em}p(x)dx\right)^{\prime} dz$

According to the Law of Unconscious Statistician wiki, this approximates distribution of $Z$ as $p$. I understand the backward proof through calculus, I don't follow the measure theory approach. For example I don't follow how this works if say $X \sim Gamma$. Is there a source I can read through for someone with no measure theory background?

Best Answer

The law of the unconscious staistician is a useful technique for computing the expected value of a random variable $Z$ that can be written as a function of a random variable $X$ that only relies on knowing the distribution of $X$, not necessarily of $Z$.

The Discrete Case

The law of the unconscious statistician can be proved in the discrete case with no reference to measure theory, and this special case can motivate the general formula.

Theorem. Let $X$ be a random variable taking values in a countable set $\mathcal{X}$. Let $\mathcal{Z}$ be a countable subset of $\mathbb{R}$, let $f : \mathcal{X} \to \mathcal{Z}$ be a function, and let $Z = f(X)$. Assume that $E[Z]$ exists. Then $$ E[Z] = \sum_{x \in \mathcal{X}} f(x) P(X = x). $$

Proof. Since $Z$ is a discrete random variable, we have $$ \begin{aligned} E[Z] &= \sum_{z \in \mathcal{Z}} z \, P(Z = z) \\ &= \sum_{z \in \mathcal{Z}} z \, P(f(X) = z) \end{aligned} $$ Now notice that the event $\{f(X) = z\}$ can be decomposed into a countable union of pairwise disjoint events based on what value $X$ takes: $$ \{f(X) = z\} = \bigcup_{\substack{x \in \mathcal{X} \\ f(x) = z}} \{X = x\}. $$ Using this observation, we may continue our calculation above: $$ \begin{aligned} E[Z] &= \sum_{z \in \mathcal{Z}} z \, P(f(X) = z) \\ &= \sum_{z \in \mathcal{Z}} z \, P\left(\bigcup_{\substack{x \in \mathcal{X} \\ f(x) = z}} \{X = x\}\right) \\ &= \sum_{z \in \mathcal{Z}} z \sum_{\substack{x \in \mathcal{X} \\ f(x) = z}} P(X = x) \\ &= \sum_{z \in \mathcal{Z}} \sum_{\substack{x \in \mathcal{X} \\ f(x) = z}} f(x) P(X = x) \\ &= \sum_{x \in \mathcal{X}} f(x) P(X = x) \end{aligned} $$

The Absolutely Continuous Case

The law of the unconscious statistician for (absolutely) continuous random variables can intuitively be derived from the discrete case by making the usual substitutions that one makes when going from the discrete case to the continuous case: sums become integrals and probability mass functions become probability densities.

Theorem. Let $X$ be an absolutely continuous random variable with probability density $p_X$. Let $f : \mathbb{R} \to \mathbb{R}$ be a Borel-measurable function (e.g., any continuous or piecewise continuous or monotonic function satisfies this assumption), and let $Z = f(X)$. Assume that $E[Z]$ exists. Then $$ E[Z] = \int_{\mathbb{R}} f(x) p_X(x) \, dx $$

This formula should make sense intuitively:

  • The sum $\sum_{x \in \mathcal{X}}$ from the discrete case became the integral $\int_{\mathbb{R}} \cdots \, dx$ in the continuous case.
  • The probability mass function $P(X = x)$ in the discrete case became the probability density $p_X(x)$ in the continuous case.

Example

In the original question, you ask about computing $E[Z]$, where $f(x) = e^{s x}$ for some $s > 0$ and $Z = f(X)$. If $X$ is a discrete random variable taking values in a countable set $\mathcal{X}$, then we get $$ E[Z] = \sum_{x \in \mathcal{X}} e^{s x} P(X = x), $$ and if $X$ is an absolutely continuous random variable with density $p_X$, then we get $$ E[Z] = \int_{\mathbb{R}} e^{s x} p_X(x) \, dx. $$ You can see that in both of these formulas, you only need to know the distribution of $X$.

The General Case

I can't help but mention the measure-theoretic result underlying this whole discussion. In general, a random variable need not be discrete or absolutely continuous (e.g., what's the distribution of the sum of a binomial random variable and a Gaussian random variable? Is it discrete? Is it continuous?). For this case, measure theory gives a concise formula for computing the expected value of a function of a random variable.

Theorem. Let $(\Omega, \mathcal{F}, P)$ be a probability space, let $(\mathcal{X}, \mathcal{B})$ be a measurable space, and let $X : \Omega \to \mathcal{X}$ be a random variable. Let $P_X = X_* P$ be the distribution of $X$, meaning that $P_X$ is the probability measure on $(\mathcal{X}, \mathcal{B})$ defined by the formula $$ P_X(B) = P(X \in B) $$ for every set $B \in \mathcal{B}$. Also, let $f : \mathcal{X} \to \mathbb{R}$ be a Borel-measurable function, and let $Z = f(X)$ (more precisely, $Z : \Omega \to \mathbb{R}$ is the function composition $Z = f \circ X$). Assume $E[Z] = \int_\Omega Z \, dP$ exists. Then $$ E[Z] = \int_{\mathcal{X}} f(x) \, dP_X(x). $$

This result is a special case of an even more general change-of-variables formula from measure theory that has nothing to do with probability. The formulas for the discrete case and the absolutely continuous case that were given above fall out as special cases of this theorem after making the following observations:

  • If $X$ is a discrete random variable, then $$ \int_{\mathcal{X}} \cdots \, dP_X(x) = \sum_{x \in \mathcal{X}} \cdots P(X = x) $$ (formally, because $P_X$ is absolutely continuous with respect to counting measure on $\mathcal{X}$, and the Radon-Nikodym derivative is $x \mapsto P(X = x)$).
  • If $X$ is an absolutely continuous random variable (with respect to Lebesgue measure) with probability density $p_X$, then $\mathcal{X} = \mathbb{R}$ and $$ \int_{\mathcal{X}} \cdots \, dP_X(x) = \int_{\mathbb{R}} \cdots p_X(x) \, dx $$ (formally, this is because, by definition, $P_X$ is absolutely continuous with respect to Lebesgue measure, and the Radon-Nikodym derivative is $p_X$).
Related Question