[Math] Conditional expectation with respect to a $\sigma$-algebra

conditional-expectationprobability theory

Could someone explain what it is that we are intuitively trying to achieve with the definition? Having read the definition I could do the problems in the section of my book, but I still have no intuitive idea of what the definition is trying to achieve. When given just a single event, I understand that the definition should be the integral in terms of the restricted measure on the event set,

$$\mu_E(A) := \mu(A\cap E)/\mu(E).$$

It's also intuitively clear what information a single event carries, i.e. "the outcome was one of these in the set event set".

Can someone explain to me the following:

  1. What do we even mean by the information carried by a $\sigma$-algebra? In other words, I can't even understand what we would like this to represent.

  2. Why do we want the conditional expectation to be a random variable? I assume this might follow naturally from (1) if I understood what we're trying to accomplish.

Best Answer

I can try to explain what I understand, what conditional expectation tries to accomplish (let's say we work on $(\Omega, \mathscr A, \mathbf P)$.

(1) The "information" carried by a $\sigma$-algebra $\mathscr F \subseteq \mathscr A$, is (like the information carried by an event) the possibility to say for a random outcome $\omega \in \Omega$ to which $A \in \mathscr F$ our $\omega \in \Omega$ belongs. The restricted measure $\def\P{\mathbf P}\P_E := \P(-\cap E)/\P(E)$ measures only the outcomes in $E$ and in 0 on $\Omega \setminus E$. So we have the "information" that $\omega \in E$ (and are not interested in the rest).

For (2), let's first look at the simple example $\mathscr F = \sigma(\{E\}) = \{\emptyset, E, \Omega\setminus E, \Omega\}$. This is the $\sigma$-algebra which corresponds to a single event, if we have this "information", then for our outcome we know whether it belongs to $E$ or to $\Omega \setminus E$. So the conditional expectation should behave different on $E$ and on $\Omega \setminus E$, namely being the expectation with respect to $\P_E$ or $\P_{\Omega \setminus E}$. Hence we have for a random variable $X$: $$ \def\E{\mathbf E}\E(X\mid \mathscr F) = \begin{cases} \E_{P_E}(X) & \omega \in E\\ \E_{\P_{\Omega\setminus E}}(X) & \omega \in \Omega \setminus E \end{cases} $$ Recall that $$ \E_{\P_E}(X) = \frac 1{\P(E)}\int_E X\, d\P $$ As the next step, let's think of some more "information", in the above we have partitioned $\Omega$ into two sets, let's now look at a partition $\Omega = \biguplus_{i=1}^\infty E_i$, and $\mathscr F = \sigma\{E_i: i \ge 1\}$. If we have the information carried by $\mathscr F$, given $\omega \in \Omega$ what can we estimate $X(\omega)$. As we "know", to which $E_i$ $\omega$ belongs (that's the information carried by $\mathscr F$), the best we can do is $$ \E(X \mid \mathscr F)(\omega) = \frac 1{\P(E_i)}\int_{E_i} X \, d\P $$ (note that this is a $\mathscr F$-measurable $\sigma$-algebra. I like to think that in general a $\mathscr F$-measurable function somehow may only "use" information carried by $\mathscr F$, so we want $E(X\mid \mathscr F)$ to be a $\mathscr F$-measurable function in general (and as we have to give different answers on different elements of $\mathscr F$ [even different families of elements of $\mathscr F$ for each element $\omega$ in general]), the conditional expectation will be a random variable.