Different definitions of conditional expectation

conditional-expectationmeasure-theoryprobability theory

As I'm trying to learn more about conditional expectation, I noticed that there are different definitions for that, depending on the book. For the following, let $(\Omega,\mathcal{F},P)$ be a measure space.

Definition 1:

Let $(X,Y)$ be a two-dimensional random variable on $\Omega$ with a joint distribution function $f_{X,Y}(x,y)$. The $(X=x)$-conditional expectation of $Y$ is given by
$$E(Y\mid X=x):=\int_{-\infty}^{\infty} yf_{Y\mid X=x}(y)dy$$

with

$$f_{Y\mid X=x}(y):=\frac{f_{X,Y}(x,y)}{f_X(x)}.$$

($f_{Y\mid X=x}(y)$ is also called conditional density function)

Definition 2:

Let $X$ be a random variable on $\Omega$ and $\mathcal{G}$ a sub $\sigma$-field of $\mathcal{F}$. A $\mathcal{G}$-measurable random variable $U$ with

$$\int_G U \ dP = \int_G X \ dP$$

for all $G \in \mathcal{G}$ is called $\mathcal{G}$-conditional expectation of $X$.

Intuitively, both definitions make sense to me. But how does one show that the definitions are equivalent?

Thank you in advance.

Best Answer

Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space. Let $(X,Y)$ be random vector with probability density function $g_{(X,Y)}$. Finally, let $f$ be any borel function, such that $\mathbb E[|f(X,Y)|] < \infty$. Then it holds: $\mathbb E[f(X,Y)|Y] = h(Y)$, where:

$$ h(y) = \frac{\int_{\mathbb R} f(x,y)g_{(X,Y)}(x,y)dx}{\int_{\mathbb R} g_{(X,Y)}(x,y)dx} $$when $\int_{\mathbb R} g_{(X,Y)}(x,y)dx \neq 0$, and $h(y) = 0$ otherwise.

Firstly, we can put $0$ in the second case, because the set $S=\{ \omega \in \Omega : \int_{\mathbb R} g_{(X,Y)}(x,Y(\omega)) dx = 0 \}$ has measure $0$. Clearly $\mathbb P(S) = \mathbb P(Y \in S_Y)$, where $S_Y = \{ y \in \mathbb R: \int_{\mathbb R} g_{(X,Y)}(x,y)dx = 0 \}$. Then $\mathbb P(Y \in S_Y) = \int_{S_Y} g_Y(y) dy $, where $g_Y$ is marginal density (It exists due to Fubini + existence of joint density of rv $(X,Y)$ ). But note $g_Y(y) = \int_{\mathbb R} g_{(X,Y)}(x,y)dx $, so we're just integrating $0$ function (cause we on $S_Y$ where it's $0$), so $\mathbb P(S) = 0$. This + the fact that Conditional Expectation is up to the set of measure $0$ allows us to forget about the case when $g_Y(y) = 0$.

So, we have to prove $2$ things:

1) $h(Y)$ is $\sigma(Y)$ measurable. Clearly both $\int_{\mathbb R} g_{(X,Y)}(x,Y) dx$ and $\int_{\mathbb R} g_{(X,Y)}(x,Y) f(x,Y) dx$ are $\sigma(Y)$ measurable due to Fubinii theorem (integrals of $\sigma(Y) -$ measurable functions are $\sigma(Y)$ measurable (We here used the fact that $g_{(X,Y)}$ is bounded and $\mathbb E[f(X,Y)]$ is finite to be able to apply Fubini's theorem.

2) For any $A \in \sigma(Y)$ we have to show $\int_A f(X,Y) d\mathbb P = \int_A h(Y) d\mathbb P$. Note that $A$ is of the form $Y^{-1}(B)$ where $B \in \mathcal B(\mathbb R)$ (borel set).

Note that $\int_A f(X,Y) d\mathbb P = \mathbb E[ f(X,Y) \cdot \chi_{_{Y \in B}} ]$ and $\int_A h(Y) d\mathbb P = \mathbb E[ h(Y) \cdot \chi_{_{Y \in B}} ]$

We'll use the fact, that if random variable/vector (in $\mathbb R^n$) $V$ has density function $g_V$, then for any borel function $\phi: \mathbb R^n \to \mathbb R^n$, we have $\mathbb E[\phi(V)] = \int_{\mathbb R^n} \phi(v) g_V(v) d\lambda_n(v)$.

Then: $$\mathbb E[ f(X,Y) \cdot \chi_{_{Y \in B}} ] = \int_{\mathbb R^2} f(x,y)\chi_{_{B}} g_{(X,Y)}(x,y) d\lambda_2(x,y) = \int_{B} \int_{\mathbb R} f(x,y)g_{(X,Y)}(x,y)dxdy $$

That last split of integrals due to fubinii (function is integrable due to our assumption with $f$ ).

And now similarly at the beggining:

$$ \mathbb E[ h(Y) \cdot \chi_{_{Y \in B}} ] = \int_{B} h(y) (\int_{\mathbb R} g_{(X,Y)}(x,y)dx)dy$$

Now due to our assumption of $h$ (that is getting rid of that case when denominator is $0$ due to its being $0$-measurable set). We have:

$$ \int_{B} (h(y)) (\int_{\mathbb R} g_{(X,Y)}(x,y)dx) dy = \int_{B} (\frac{\int_{\mathbb R} g_{(X,Y)}(x,y)f(x,y)dx}{\int_{\mathbb R} g_{(X,Y)}(x,y)dx}) (\int_{\mathbb R} g_{(X,Y)}(x,y)dx )dy$$

After simplification we get $\mathbb E[ h(Y) \cdot \chi_{_{Y \in B}} ] = \int_{B} \int_{\mathbb R} f(x,y)g_{(X,Y)}(x,y)dxdy = \mathbb E[ f(X,Y) \cdot \chi_{_{Y \in B}} ]$, what we wanted to prove.

Now your "definition $1$" follows when you take $f(x,y) = x$. Then $ h(y) = \mathbb E[X|Y=y] $