[Math] Intuition behind Conditional Expectation

conditional-expectationintuitionprobabilityprobability theory

I'm struggling with the concept of conditional expectation. First of all, if you have a link to any explanation that goes beyond showing that it is a generalization of elementary intuitive concepts, please let me know.

Let me get more specific. Let $\left(\Omega,\mathcal{A},P\right)$ be a probability space and $X$ an integrable real random variable defined on $(\Omega,\mathcal{A},P)$. Let $\mathcal{F}$ be a sub-$\sigma$-algebra of $\mathcal{A}$. Then $E[X|\mathcal{F}]$ is the a.s. unique random variable $Y$ such that $Y$ is $\mathcal{F}$-measurable and for any $A\in\mathcal{F}$, $E\left[X1_A\right]=E\left[Y1_A\right]$.

The common interpretation seems to be: "$E[X|\mathcal{F}]$ is the expectation of $X$ given the information of $\mathcal{F}$." I'm finding it hard to get any meaning from this sentence.

  1. In elementary probability theory, expectation is a real number. So the sentence above makes me think of a real number instead of a random variable. This is reinforced by $E[X|\mathcal{F}]$ sometimes being called "conditional expected value". Is there some canonical way of getting real numbers out of $E[X|\mathcal{F}]$ that can be interpreted as elementary expected values of something?

  2. In what way does $\mathcal{F}$ provide information? To know that some event occurred, is something I would call information, and I have a clear picture of conditional expectation in this case. To me $\mathcal{F}$ is not a piece of information, but rather a "complete" set of pieces of information one could possibly acquire in some way.

Maybe you say there is no real intuition behind this, $E[X|\mathcal{F}]$ is just what the definition says it is. But then, how does one see that a martingale is a model of a fair game? Surely, there must be some intuition behind that!

I hope you have got some impression of my misconceptions and can rectify them.

Best Answer

Maybe this simple example will help. I use it when I teach conditional expectation.

(1) The first step is to think of ${\mathbb E}(X)$ in a new way: as the best estimate for the value of a random variable $X$ in the absence of any information. To minimize the squared error $${\mathbb E}[(X-e)^2]={\mathbb E}[X^2-2eX+e^2]={\mathbb E}(X^2)-2e{\mathbb E}(X)+e^2,$$ we differentiate to obtain $2e-2{\mathbb E}(X)$, which is zero at $e={\mathbb E}(X)$.

For example, if I throw a fair die and you have to estimate its value $X$, according to the analysis above, your best bet is to guess ${\mathbb E}(X)=3.5$. On specific rolls of the die, this will be an over-estimate or an under-estimate, but in the long run it minimizes the mean square error.

(2) What happens if you do have additional information? Suppose that I tell you that $X$ is an even number. How should you modify your estimate to take this new information into account?

The mental process may go something like this: "Hmmm, the possible values were $\lbrace 1,2,3,4,5,6\rbrace$ but we have eliminated $1,3$ and $5$, so the remaining possibilities are $\lbrace 2,4,6\rbrace$. Since I have no other information, they should be considered equally likely and hence the revised expectation is $(2+4+6)/3=4$".

Similarly, if I were to tell you that $X$ is odd, your revised (conditional) expectation is 3.

(3) Now imagine that I will roll the die and I will tell you the parity of $X$; that is, I will tell you whether the die comes up odd or even. You should now see that a single numerical response cannot cover both cases. You would respond "3" if I tell you "$X$ is odd", while you would respond "4" if I tell you "$X$ is even". A single numerical response is not enough because the particular piece of information that I will give you is itself random. In fact, your response is necessarily a function of this particular piece of information. Mathematically, this is reflected in the requirement that ${\mathbb E}(X\ |\ {\cal F})$ must be $\cal F$ measurable.

I think this covers point 1 in your question, and tells you why a single real number is not sufficient. Also concerning point 2, you are correct in saying that the role of $\cal F$ in ${\mathbb E}(X\ |\ {\cal F})$ is not a single piece of information, but rather tells what possible specific pieces of (random) information may occur.