Property of conditional expectation $E(X | \mathcal{V})$ where $\mathcal{V}$ is $\sigma$-algebra.

conditional probabilityconditional-expectationmeasure-theoryprobability theoryrandom variables

I'm self-studying the probability theory, and I got stuck on the understanding of the definition given below and some consequences that follow from that definition.

Let $(\Omega, \mathcal{U}, P)$ be a probability space. Suppose $\mathcal{V}$ is a $\sigma$-algebra, $\mathcal{V} \subseteq \mathcal{U}$. Then
$$E(X|\mathcal{V})$$
is defined to be any random variable on $\Omega$ such that

$E(X|\mathcal{V})$ is $\mathcal{V}$-measurable and
$\int_A X dP = \int_A E(X|\mathcal{V})dP$ for all $A \in \mathcal{V}$.

It is somehow "obvious" that if $X$ is $\mathcal{V}$-measurable than $E(X|\mathcal{V}) = X$.

I would like to get help on the following challenges that I'm facing:

I don't understand what $E(X|\mathcal{V})$ means on an intuitive level.
I don't see how to get this property: If $X$ is $\mathcal{V}$-measurable, then $E(X|\mathcal{V}) = X$.

So far, I'm familiar with usual conditional expectation stuff like $E(X|Y=y)$ or $E(X|Y)$, but $E(X|\mathcal{V})$ is something new to me.

P.S. given that I'm stuck on somehow "obvious" concepts, I would appreciate simple explanations.

Best Answer

To begin with, it is worth making some comments. Given a probability space $(\Omega,\mathcal{U},\textbf{P})$, we can think of $\mathcal{U}$ as the information we have at hand related to the random phenomenon we are interested in. More precisely, the $\sigma$-algebra $\mathcal{U}$ tells us what are the events that we can observe the occurrence of. So, when one considers a sub-$\sigma$-algebra $\mathcal{V}\subseteq\mathcal{U}$, we are restricting the information about the random phenomenon we are studying.

Based on such interpretation, we can consider the conditional expectation $\textbf{E}[X\mid\mathcal{V}]$ as the random variable which best approximates $X$ based on the knowledge of $\mathcal{V}\subseteq\mathcal{U}$. This means that $Y := \textbf{E}[X\mid\mathcal{V}]$ should be $\mathcal{V}$-measurable and both $Y$ and $X$ should coincide (in average) at every given measurable set $A\in\mathcal{V}$. That is why $\textbf{E}[X\mid\mathcal{U}]$ equals $X$: the best approximation of $X$ given all the knowledge of $X$ is the random variable $X$ itself.

In order to make it clearer, let us consider the particular case where $Y$ is a simple random variable. This means that we can express $Y$ as a linear combination of indicator functions of measurable sets which partitions the sample space $\Omega$: \begin{align*} Y(\omega) = \sum_{i=1}^{n}y_{i}1_{D_{i}}(\omega) \end{align*}

In such context, if we let that $\mathcal{D}_{Y} = \{D_{1},D_{2},\ldots,D_{n}\}$, then the conditional expectation is given by: \begin{align*} \textbf{E}[X\mid Y](\omega) = \textbf{E}[X \mid \mathcal{D}_{Y}](\omega) = \sum_{i=1}^{n}\textbf{E}[X\mid D_{i}]1_{D_{i}}(\omega) \end{align*}

In other words, we are approximating $X$ by $\textbf{E}[X\mid D_{i}]$ for every $\omega\in D_{i}$. This is not a good approximation, because we are approximating $X$ by a constant at each $D_{i}$, but it is the best approximation among such type of approximations.

Generally speaking, given a probability space $(\Omega,\mathcal{U},\textbf{P})$ where $X$ is $\mathcal{U}$-measurable, $Y$ is $\mathcal{V}$-measurable and $\mathcal{V}\subseteq\mathcal{U}$, we can define the conditional expectation as follows: \begin{align*} \textbf{E}[X\mid Y] = \textbf{E}[X\mid\sigma(Y)] \end{align*} where $\sigma(Y)$ is the $\sigma$-algebra generated by $Y$. Based on such definition, you can recover the usual definition of conditional expectation that you are acquainted to.

Finally, as @OliverDíaz has mentioned, you can formalize what has been discussed in terms of the best approximation related to the quadratic mean of the difference.

Best Answer

Related Solutions

Conditional Expectation of X with respect to a $\sigma$-algebra $\mathcal{G}$

Conditional expectation with respect to $\sigma$-algebra.

Related Question