Understand conditional expectation

conditional-expectationprobability theory

The definition of conditional expectation is:

Given probability space $(\Omega, \mathcal{F}, P)$, let $\mathcal{D}$ be a sub-sigma field of $\mathcal{F}$ (i.e., $\mathcal{D}\subset \mathcal{F}$ and $\mathcal{D}$ is a $\sigma$-algebra), and let $X$ be a integrable random variable. Then there is a unique (up to $P$-null set) random variable $E(X|\mathcal{D})$ such that:

  1. $E(X|\mathcal{D})$ is $\mathcal{D}$ measurable.
  2. $\int_D E(X|\mathcal{D})dP=\int_DXdP$, for all $D\in\mathcal{D}$.

We call $E(X|\mathcal{D})$ as the conditional expectation of $X$ given $\mathcal{D}$.

I cannot understand the purpose of conditional expectation. Why do we need to define such a concept? According to Durrett, the interpretation of it is "the best guess of the value of $X$ given the information $\mathcal{D}$ we have". Then, my questions are:

  1. What are we trying to learn about $X$? Are we tring to learn its distribution, or $\int_DXdP$ for all$D\in\mathcal{D}$, or $X(\omega)$ for all $\omega\in\Omega$, or $P(X\in D)$ for $D\in\mathcal{D}$?
  2. What do we know about $X$? Do we assume that we cannot observe the realization of $X$ (Otherwise, why do we need to "guess"?)? Do we know the integral of $X$ over $\mathcal{D}$-measurable sets (Otherwise, how can we construct a conditional expecation since we cannot verify the second requirement in the definition?)?
  3. What is the meaning of a realization (i.e., $E(X|\mathcal{D})(\omega)$ for some $\omega\in\Omega$) of $E(X|\mathcal{D})$?

I am totally confused about this concept. I do not know what is the intention here. Any explanation will be appreciated.

Best Answer

You may be more familiar with conditional probability $$ \mathbb P(A\mid B)=\frac{\mathbb P(A\cap B)}{\mathbb P(B)},\qquad (\star) $$ which is a fundamental concept in statistics and informal probability (and applied in many other situations outside of mathematics and in everyday life...)

Conditional expectation is a vast generalization of conditional probability, where now the set $B$ is replaced by a sigma field (corresponding to your $\mathcal D$) and $A$ is interpreted as an indicator random variable $1_A$, which is then generalized to be an arbitrary random variable $X$. So both $A$ and $B$ are replaced with vastly more general objects.

Now since measure-theoretic probability is mathematically rigorous, you have to worry about "pedantic" situations like when $\mathbb P(B)=0$ and the conditional probability formula $(\star)$ becomes undefined. In fact, this seemingly simple problem is the source of the subtleties in the definition of conditional expectation. This is what causes the uncertainty up to sets of probability $0$.

Okay with that preamble out of the way, I can now answer your questions in a better context.

  1. What we learn from $X$ by replacing it with the random variable variable $\mathbb E(X\mid \mathcal D)$ is its "coarse grained" behavior when averaged over the sets in $\mathcal D$. For instance, in the extreme cases when $\mathcal D=\mathcal F$ there is no extra averaging and $X=\mathbb E(X\mid \mathcal F)$ up to null sets, whereas when $\mathcal D=\{0,\Omega\}$ the conditional expectation $\mathbb E(X\mid \mathcal D)$ becomes equal to the constant $\mathbb EX$, up to a null set. In between these two extremes, you can imagine the sets in $\mathcal D$ as being unions of sets in a partition of the probability space, and the value of the conditional expectation on each "part" of the partition is its expectation when restricted to that "part".

  2. To compute $\mathbb E(X\mid \mathcal D)$ we need to know $X$ up to $\mathbb P$-null sets, since otherwise the right side of your equation (2) cannot be computed. That's the answer to the literal interpretation of your question, but I think more in the spirit of what you are asking is to understand when $\mathbb E(X\mid \mathcal D)=\mathbb E(Y\mid \mathcal D)$ up to null sets. By subtracting the two sides, this is equivalent to asking when $\mathbb E(Z\mid \mathcal D)=0$, and the answer is that it will happen whenever $Z$ has mean $0$ when restricted to any set in $\mathcal D$.

  3. A "realization" means, in this context, a representative of an equivalence class of measurable functions that are equal up to null sets. In this case, the definition of conditional expectation does not actually identify a unique random variable $\mathbb E(X\mid \mathcal D)$, but it gives conditions on such a random variable. It turns out with a little work, one can show that while there are many random variables satisfying these conditions, they all belong to the same equivalence class. Thus the equivalence class is uniquely defined, and a "realization" (also called a "version") is any element of this equivalence class.

I remember when I first learned measure-theoretic probability, conditional expectation was the hardest concept for me to understand. Even after I understood the definition well, it still took me some time to gain a good intuition for it. The book I learned it from was PTE, and the example that finally made things "click" for me was example 4.1.5 on page 208 (page numbers accurate as of Version 5, January 11, 2019).