Why are these two definitions of conditional expectation equivalent

conditional-expectationdefinitionprobability theory

From Rick Durrett's book Probability: Theory and Examples:

We define the conditional expectation of $X$ given $\mathcal{G}$, $E(X | \mathcal{G})$ to be any random variable $Y$ that has

(1) $Y \in \mathcal{G}, \text { i.e., is } \mathcal{G} \text { measurable }$

(2) $\text {for all } A \in \mathcal{G}, \int_{A} X d P=\int_{A} Y d P$

And in other materials I found:

Let $(\Omega, \mathscr{F}, P)$ be a probability space and let $\mathscr{G}$ be a σ−algebra contained in $\mathscr{F}$. For any real random variable $X \in L^{1}(\Omega, \mathscr{F}, P)$, $\operatorname{define} E(X | \mathscr{G})$ to be the unique random variable $Z \in L^{1}(\Omega, \mathscr{G}, P)$ such that for every bounded $\mathscr{G}-\text { measurable }$ random variable $Y$, $$E(X Y)=E(Z Y)$$

Best Answer

The difference between the two definitions is that in the first one, we need to do the test that $\mathbb E\left[XY\right]=\mathbb E\left[ZY\right]$ only when $Y$ has the form $\mathbf 1_A$ for all $A\in\mathcal G$ whereas in the second definition, this should be done for all the bounded $\mathcal G$-measurable functions.

All we need is the following fact:

Let $X$ be an integrable random variable on a probability space $\left(\Omega,\mathcal F,\mathbb P\right)$ and let $\mathcal G$ be a sub-$\sigma$-algebra of $\mathcal F$. Assume that for all $A\in\mathcal G$, the equality $\mathbb E\left[X\mathbf 1_A\right]=0$. Then for each $\mathcal G$-measurable bounded function $Y$, $\mathbb E\left[XY\right]=0$.

We can use the fact that a bounded $\mathcal G$-measurable function can be approximated in the uniform norm by a linear combination of indicator functions.