Probability Theory – Converting an Integral to a Sum for Expectation of a Discrete Random Variable

expectationintegrationlebesgue-integralmeasure-theoryprobability theory

According to wikipedia and all my textbooks, we define the expectation of a random variable on a probability space $(\Omega, \mathcal{F},P)$ :
\begin{align}
E(X) &= \int_{\Omega}XdP\\
\end{align}

This is no matter if the random variable is discrete, continuous or neither. What I would like to know is how to derive the basic formulas :
\begin{align}
E(X) &= \sum_i \omega_ip(\omega_i)\\
E(X) &= \int_{\mathbb{R}} xp(x)d\lambda\\
\end{align}

For the discrete and continuous case. Please note that I got the derivation of the continuous case already figured out. I am trying to do the same for the discrete one. So I define the distribution of $X$ like so $PX^{-1}(A) = \sum_i a_i\mathbb{1}_A(a_i)$ because the distribution is finite valued. Now plugin everything together I get stuck with an integral I don't know how to solve. None of my textbooks provide a derivation like this and switch directly from integral to sums.


I am sorry that my question wasn't very clear, this is my 2nd attempt to explain it.

Best Answer

$\newcommand{\one}{\mathbb{1}}$ Reposting my comments as an answer, and adding another way of proving the same.

Mathematical proof

As you say, $E[X]=\int_{\Omega}X\mathrm{d}P$. We know that, if $\mu_X$ is the pushforward measure pushed onto $\mathbb{R}$ by $X$, that we can equivalently define $E[X]=\int_{\mathbb{R}}x\mathrm{d}\mu_X$. So now we only need to prove that if $X$ is discrete with density $p_X$, then for any function we have: $$\int_{\mathbb{R}}f(x)\mathrm{d}\mu_X=\sum_{x\in\Omega}f(x)p_X(x).$$

We start from indicators. The integral, when $f=\mathbb{1}_B$, reduces to $\mu_X(B)$. But by definition of discrete probability, $\mu_X(B)=\sum_{x\in B}p_X(x)$. And since $\mathbb{1}_B$ is 1 on $B$ and 0 out of it, the sum $\sum f(x)p_X(x)$ is precisely $\sum_{x\in B}p_X(x)$. In this case, we have the desired equality.

Next step: simple functions. If $f=\sum a_i\one_{A_i}$, a finite sum of multiples of indicators, by linearity of the integral we have that the LHS of our desired equality is $\sum a_i\int\one_{A_i}\mathrm{d}\mu_X$, which is, by the result above, $\sum a_i(\sum_{x\in A_i}p_X(x))$, which you can easily see to match $\sum f(x)p_X(x)$.

Next step: positive functions. If $f\geq0$, we know it can be approximated monotonely by simple functions. The limit passes under the integral by the monotone convergence theorem. If the sum on the RHS is finite, we can easily pass the limit under the sum. If it is infinite, let's write it. We know $f_n\uparrow f$. The sum is $\sum f(x)p_X(x)$, and we want it to equal $\lim\sum f_n(x)p_X(x)$. So we need to prove that: $$\lim_n\sum_{x\in\Omega}f_n(x)p_X(x)=\sum_{x\in\Omega}\lim_nf_n(x)p_X(x).$$ Now, the sum is defined as the supremum of finite subsums. For each finite subsum the equality holds. So for any $S\subseteq\Omega$ with $S$ finite, we have: $$\lim_n\sum_{x\in S}f_n(x)p_X(x)=\sum_{x\in S}\lim_nf_n(x)pX(x).$$ The sup of the RHS is the RHS above. Can we justify the swapping of sup and limit? Suppose we have $s_{n,k}$ with $n\in\mathbb{N},k\in I$ with $I$ any index set. We want to compare $lim_n\sup_ks_{n,k}$ and $sup_k\lim_ns_{n,k}$. The sup of the limits is greater than any limit. Therefore it is greater than any $s_{n,k}$. Therefore it is greater than any sup, implying it is greater than the limit of the sups. So $\sup\lim\geq\lim\sup$. The sequences we are actually working with are monotonic increasing in $n$, so $s_{n,k}\leq s_{n+1,k}$ for all $k,n$ (maybe I should say monotonic non-decreasing rather than increasing; details :) ). Therefore the sequence of the sups is also monotone, so the limit of the sups is greater than all of the sups, so greater than all $s_{n,k}$, so greater than any $\lim_ns_{n,k}$, and so also of their sup. This proves the other inequality, giving us the desired equality. That was the hardest part.

For generic functions, remember that an infinite sum is defined as the sum of the positive parts minus that of the negative parts, where those are defined as with positive functions, i.e. as the above sup. But for positive and negative part, we have proved the equality. So we have the equality for generic functions.

In summary: $$\int_{\mathbb{R}}x\mathrm{d}\mu_X(x)=\sum_{x\in\Omega}xp_X(x),$$ actually even more generally for any measurable function $f$ we have: $$\int f(x)\mathrm{d}\mu_X(x)=\sum f(x)p_X(x).$$


Bonus

By the way, from the reasoning above you can extract the following lemma.

Lemma

If $s_{n,k}$ are real numbers for which $n\in\mathbb{N}$, $k\in I$ with $I$ a generic set of indices, $s_{n,k}\leq s_{n+1,k}$ for all $k\in I$, then: $$\lim_{n\to\infty}\sup_{k\in I}s_{n,k}=\sup_{k\in I}\lim_{n\to\infty}s_{n,k}.$$

Proof

We want to compare $lim_n\sup_ks_{n,k}$ and $sup_k\lim_ns_{n,k}$. The sup of the limits is greater than any limit. Therefore it is greater than any $s_{n,k}$. Therefore it is greater than any sup, implying it is greater than the limit of the sups. So: $$(1)\qquad \sup\lim\geq\lim\sup.$$ The sequences $(s_{n,k})_{n\in\mathbb{N}}$ we are working with are monotonic nondecreasing in $n$, so $s_{n,k}\leq s_{n+1,k}$ for all $k,n$. Therefore the sequence of the sups is also monotone, so the limit of the sups is greater than all of the sups, so greater than all $s_{n,k}$, so greater than any $\lim_ns_{n,k}$, and so also of their sup. This proves the other inequality: $$(2)\qquad \lim\sup\geq\sup\lim.$$ By combining (1) and (2), we get the lemma.

By a similar argument, one can probably deduce the inf swaps with the limits in case of non-increasing monotonicity ($s_{n,k}\geq s_{n+1,k}$ for all $k\in I,n\in\mathbb{N}$).


Formal derivation

If by derive it you mean get to it without a proof, well, we can always assume that if $\mu_i$ are measures, then: $$\mathrm{d}(\sum a_i\mu_i)=\sum a_i\mathrm{d}\mu_i. \tag{$\ast$}$$ Your expression of $PX^{-1}$ doesn't really make sense, since it is not a measure but a sum of multiples of indicators. Now a discreet random variable takes a finite or numerable set of values with probability 1. If $A$ is a set in the $\sigma$-algebra of the target space of the variable (typically, a Borel set in the reals), the distribution $\mu_X(A)$ can, by $\sigma$-additivity of measures, be written as: $$\mu_X(A)=\sum_{x\in A\cap V}p(x),$$ $V$ being the above countable set. That is the same value as $\sum_{x\in V}p(x)\delta_{x}(A)$, $\delta$ representing the Dirac delta measure. So we have seen that: $$\mu_X=\sum_Vp(x)\delta_{x}.$$ Going back to the expectation, we know that: $$E[X]=\int_{\Omega}X\mathrm{d}P=\int_{\mathbb{R}}t\mathrm{d}\mu_X(t)=\int_{\mathbb{R}}t\mathrm{d}\left(\sum_Vp(x)\delta_x(t)\right).$$ Assuming $\ast$ above, we continue the equality chain: $$E[X]=\int_{\mathbb{R}}t\sum_Vp(x)\mathrm{d}\delta_x(t).$$ We can also assume the integral and sum can be harmlessly swapped. This then becomes a sum of integrals, and integrating something with respect to a delta gives the value of that something at the center of the delta, i.e. $\int f(t)\mathrm{d}\delta_x(t)=f(x)$, so: $$E[X]=\sum_V\int_{\mathbb{R}}tp(x)\mathrm{d}\delta_x(t)=\sum_Vxp(x),$$ the desired result.

Related Question