This is maybe a trivial question I came up while solving a few examples and understanding Markov/Chebyshev inequalities and subsequently in evaluating Chernoff bounds. Suppose $X$ is a random variable with some distribution $p$ and we need to evaluate $\mathbb{E}[f(X)]$. Here, $f(X)$ may or may not have the same distribution as $X$. In my case, if $f(X) = e^{sX}$, where $s>0$. I would work out this problem by letting $Z = e^{sX}$ and
$\mathbb{E}[f(X)] = \mathbb{E}[Z] = \int_{\infty}^{\infty}z \hspace{0.2em}p(z)dz$.
Now I find the distribution of $Z$ as:
$P\{ Z \leq z \} \implies P\{ e^{sX} \leq z \} \implies P\{ X \leq \frac{\log(z)}{s}\} = \int_{-\infty}^{\frac{\log(z)}{s}}x.p(x)dx
$
thereby substituting this into the expectation one gets
$\mathbb{E}[Z] = \int_{\infty}^{\infty}z\hspace{0.2em} \left(\int_{-\infty}^{\frac{\log(z)}{s}}x\hspace{0.2em}p(x)dx\right)^{\prime} dz$
According to the Law of Unconscious Statistician wiki, this approximates distribution of $Z$ as $p$. I understand the backward proof through calculus, I don't follow the measure theory approach. For example I don't follow how this works if say $X \sim Gamma$. Is there a source I can read through for someone with no measure theory background?
Best Answer
The law of the unconscious staistician is a useful technique for computing the expected value of a random variable $Z$ that can be written as a function of a random variable $X$ that only relies on knowing the distribution of $X$, not necessarily of $Z$.
The Discrete Case
The law of the unconscious statistician can be proved in the discrete case with no reference to measure theory, and this special case can motivate the general formula.
Proof. Since $Z$ is a discrete random variable, we have $$ \begin{aligned} E[Z] &= \sum_{z \in \mathcal{Z}} z \, P(Z = z) \\ &= \sum_{z \in \mathcal{Z}} z \, P(f(X) = z) \end{aligned} $$ Now notice that the event $\{f(X) = z\}$ can be decomposed into a countable union of pairwise disjoint events based on what value $X$ takes: $$ \{f(X) = z\} = \bigcup_{\substack{x \in \mathcal{X} \\ f(x) = z}} \{X = x\}. $$ Using this observation, we may continue our calculation above: $$ \begin{aligned} E[Z] &= \sum_{z \in \mathcal{Z}} z \, P(f(X) = z) \\ &= \sum_{z \in \mathcal{Z}} z \, P\left(\bigcup_{\substack{x \in \mathcal{X} \\ f(x) = z}} \{X = x\}\right) \\ &= \sum_{z \in \mathcal{Z}} z \sum_{\substack{x \in \mathcal{X} \\ f(x) = z}} P(X = x) \\ &= \sum_{z \in \mathcal{Z}} \sum_{\substack{x \in \mathcal{X} \\ f(x) = z}} f(x) P(X = x) \\ &= \sum_{x \in \mathcal{X}} f(x) P(X = x) \end{aligned} $$
The Absolutely Continuous Case
The law of the unconscious statistician for (absolutely) continuous random variables can intuitively be derived from the discrete case by making the usual substitutions that one makes when going from the discrete case to the continuous case: sums become integrals and probability mass functions become probability densities.
This formula should make sense intuitively:
Example
In the original question, you ask about computing $E[Z]$, where $f(x) = e^{s x}$ for some $s > 0$ and $Z = f(X)$. If $X$ is a discrete random variable taking values in a countable set $\mathcal{X}$, then we get $$ E[Z] = \sum_{x \in \mathcal{X}} e^{s x} P(X = x), $$ and if $X$ is an absolutely continuous random variable with density $p_X$, then we get $$ E[Z] = \int_{\mathbb{R}} e^{s x} p_X(x) \, dx. $$ You can see that in both of these formulas, you only need to know the distribution of $X$.
The General Case
I can't help but mention the measure-theoretic result underlying this whole discussion. In general, a random variable need not be discrete or absolutely continuous (e.g., what's the distribution of the sum of a binomial random variable and a Gaussian random variable? Is it discrete? Is it continuous?). For this case, measure theory gives a concise formula for computing the expected value of a function of a random variable.
This result is a special case of an even more general change-of-variables formula from measure theory that has nothing to do with probability. The formulas for the discrete case and the absolutely continuous case that were given above fall out as special cases of this theorem after making the following observations: