Property of conditional expectation $E(X | \mathcal{V})$ where $\mathcal{V}$ is $\sigma$-algebra.

conditional probabilityconditional-expectationmeasure-theoryprobability theoryrandom variables

I'm self-studying the probability theory, and I got stuck on the understanding of the definition given below and some consequences that follow from that definition.

Let $(\Omega, \mathcal{U}, P)$ be a probability space. Suppose $\mathcal{V}$ is a $\sigma$-algebra, $\mathcal{V} \subseteq \mathcal{U}$. Then
$$E(X|\mathcal{V})$$
is defined to be any random variable on $\Omega$ such that

  1. $E(X|\mathcal{V})$ is $\mathcal{V}$-measurable and
  2. $\int_A X dP = \int_A E(X|\mathcal{V})dP$ for all $A \in \mathcal{V}$.

It is somehow "obvious" that if $X$ is $\mathcal{V}$-measurable than $E(X|\mathcal{V}) = X$.

I would like to get help on the following challenges that I'm facing:

  1. I don't understand what $E(X|\mathcal{V})$ means on an intuitive level.
  2. I don't see how to get this property: If $X$ is $\mathcal{V}$-measurable, then $E(X|\mathcal{V}) = X$.

So far, I'm familiar with usual conditional expectation stuff like $E(X|Y=y)$ or $E(X|Y)$, but $E(X|\mathcal{V})$ is something new to me.

P.S. given that I'm stuck on somehow "obvious" concepts, I would appreciate simple explanations.

Best Answer

To begin with, it is worth making some comments. Given a probability space $(\Omega,\mathcal{U},\textbf{P})$, we can think of $\mathcal{U}$ as the information we have at hand related to the random phenomenon we are interested in. More precisely, the $\sigma$-algebra $\mathcal{U}$ tells us what are the events that we can observe the occurrence of. So, when one considers a sub-$\sigma$-algebra $\mathcal{V}\subseteq\mathcal{U}$, we are restricting the information about the random phenomenon we are studying.

Based on such interpretation, we can consider the conditional expectation $\textbf{E}[X\mid\mathcal{V}]$ as the random variable which best approximates $X$ based on the knowledge of $\mathcal{V}\subseteq\mathcal{U}$. This means that $Y := \textbf{E}[X\mid\mathcal{V}]$ should be $\mathcal{V}$-measurable and both $Y$ and $X$ should coincide (in average) at every given measurable set $A\in\mathcal{V}$. That is why $\textbf{E}[X\mid\mathcal{U}]$ equals $X$: the best approximation of $X$ given all the knowledge of $X$ is the random variable $X$ itself.

In order to make it clearer, let us consider the particular case where $Y$ is a simple random variable. This means that we can express $Y$ as a linear combination of indicator functions of measurable sets which partitions the sample space $\Omega$: \begin{align*} Y(\omega) = \sum_{i=1}^{n}y_{i}1_{D_{i}}(\omega) \end{align*}

In such context, if we let that $\mathcal{D}_{Y} = \{D_{1},D_{2},\ldots,D_{n}\}$, then the conditional expectation is given by: \begin{align*} \textbf{E}[X\mid Y](\omega) = \textbf{E}[X \mid \mathcal{D}_{Y}](\omega) = \sum_{i=1}^{n}\textbf{E}[X\mid D_{i}]1_{D_{i}}(\omega) \end{align*}

In other words, we are approximating $X$ by $\textbf{E}[X\mid D_{i}]$ for every $\omega\in D_{i}$. This is not a good approximation, because we are approximating $X$ by a constant at each $D_{i}$, but it is the best approximation among such type of approximations.

Generally speaking, given a probability space $(\Omega,\mathcal{U},\textbf{P})$ where $X$ is $\mathcal{U}$-measurable, $Y$ is $\mathcal{V}$-measurable and $\mathcal{V}\subseteq\mathcal{U}$, we can define the conditional expectation as follows: \begin{align*} \textbf{E}[X\mid Y] = \textbf{E}[X\mid\sigma(Y)] \end{align*} where $\sigma(Y)$ is the $\sigma$-algebra generated by $Y$. Based on such definition, you can recover the usual definition of conditional expectation that you are acquainted to.

Finally, as @OliverDíaz has mentioned, you can formalize what has been discussed in terms of the best approximation related to the quadratic mean of the difference.