[Math] Conditional expectation as a Radon-Nikodym derivative.

conditional-expectationprobability theory

I found the following very nice post yesterday which presented the conditional expectation in a way which I found intuitive;

Conditional expectation with respect to a $\sigma$-algebra.

I wonder if there is a way to see that $E(X\mid \mathcal{F}_n)(\omega)=\frac 1 {P(E_i)} \int_{E_i}X \, dP$ if $\omega \in E_i$, could be regarded a Radon-Nikodym derivative. I cant formally connect the dots with respect to for example to the Wikipedia discussion,

https://en.wikipedia.org/wiki/Conditional_expectation.

I am missing the part where the measure gets "weighted" i.e somthing analogous to $\frac 1 {P(E_i)}$, in the Wikipedia article.

Update

It just hit me that if one divides the defining relation by the measure of the set the one has

$\frac{1}{P(E_{i})} \int_{E_i}X \, dP=\frac{1}{P(E_{i})} \int_{E_i}E(X\mid \mathcal{F}_n) \, dP$ for all $E_{i}\in \mathcal{F}_{n}$. This looks like we have a function which agrees with the avarages of $X$ a.e on every set of the algebra $\mathcal{F}_{n}$. This is not quite what @martini writes but maybe it is a reansable way to look at it aswell? This looks like somthing which would fit into the wikipedia disussion better tho. But it donst sit right on some other occasions.

So the question remains,

How do I think about this in the right way? If the second way is wrong, then how is the first consistant with the Wikipedia article?

My comment of Sangchuls answer will also be of help when understadning my troubles!

Best Answer

Your intuition and formula makes sense when each $E_i$ is an elementary event in ${\cal F} $ which can not be decomposed into 2 disjoint events, both of positive probability. If it can be decomposed then there is a mis-match as in general $E(X|{\cal F})(\omega)$ is not constant over $\omega\in E_i$ while your formula is.

Suppose that $\Omega$ may be partitioned into a countable family of disjoint measurable events $E_i$, $i\geq 1$. It suffices to keep only the events with strictly positive probability, as they will carry the total probability. The $\sigma$-algebra ${\cal F}$ generated by this partition simply consists of all unions of elements in this family. A measurable function w.r.t ${\cal F}$ is precisely a linear combination of $\chi_{E_i}$, the characteristic functions on our disjoint family of events. We may thus write: $$ E(X|{\cal F}) (\omega) = \sum_j c_j \chi_{E_j}(\omega)$$ The constants may be computed from the fact that $\int_{E_i} E(X|{\cal F}) dP = c_i P(E_i) = \int_{E_i} X\; dP$. We get: $$ E(X|{\cal F}) (\omega) = \sum_j\chi_{E_j}(\omega) \frac{1}{P(E_j)} \int_{E_j} X\; dP $$ corresponding to the formula you mentioned. By writing down the defining equation you see that this indeed is the Radon-Nykodym derivative of $\nu(E)=\int_E X \; dP$, $E\in {\cal F}$ with respect to $P_{|{\cal F}}$.

Conditional expectation, however, becomes less intuitive when ${\cal F}$ is no longer generated by a countable partition, although sometimes you may find a tweak to get around. Example: Let $P$ be a probability on ${\Bbb R}$ having density wrt Lebesgue $f\in L^1({\Bbb R})$, $dP(x) = f(x) dx$.

We will consider a sub-$\sigma$-algebra generated by symmetric subsets of the Borel $\sigma$-algebra. Thus $A\in {\cal F}$ iff $x\in A \Leftrightarrow -x\in A$.

A measurable function wrt ${\cal F}$ is now any function which is symmetric, i.e. $Y(x)=Y(-x)$ for all $x$. This time an elementary event consists of a symmetric couple $\{x,-x\}$ which has zero probability. And you can not throw all these away when calculating conditional expectation. So going back to the definition, given an $X\in L^1(dP)$ you need to find a symmetric integrable function $Y$ so that for any measurable $I\subset (0,+\infty)$ you have: $$ \int_{I\cup (-I)} Y\; dP = \int_{I\cup (-I)} X \; dP $$ Using that $Y$ is symmetric and a change of variables this becomes: $$ \int_I Y(x) (f(x)+f(-x)) \; dx = \int_I (X(x) f(x) + X(-x) f(-x)) \; dx $$ On the set $\Lambda = \{ x\in {\Bbb R} : f(x)+f(-x)>0 \}$ which has full probability we may then solve this by defining: $$ Y(x) = \frac{X(x) f(x) + X(-x) f(-x) }{f(x) + f(-x) }, \; x\in \Lambda. $$ On the complement $Y$ is not defined but the complement has zero probability. $Y$ is then symmetric and has the same expectation as $X$ on symmetric events. Again $Y(x)$ is the Radon-Nikodym derivative of $\nu(E) = \int_E X \; dP$ wrt $P(E)$ with $E\in {\cal F}$.

Our luck here is that there is a simple symmetry, i.e. $x\mapsto -x$, describing the events in ${\cal F}$ and that the probability measure transforms nicely under this symmetry. In more general situations you may not be able to describe $E(X|{\cal F})$ explicitly in terms of values of $X$ and you are stuck with just the defining properties for conditional expectation [which, on the other hand, may suffice for whatever computation you need to carry out].