Why $E[X\mid Y] = E[X\mid\sigma(Y)]$

conditional probabilityconditional-expectationprobability theorystatistics

Let $X$ be an integrable random variable on $(\Omega, F, P)$. Let $Y$ be measurable from $(\Omega, F, P)$ to $(A, G)$. The conditional expectation of $X$ given $Y$ is defined to be

$E[X\mid Y] = E[X\mid\sigma(Y)]$

I can not understand the reason for this equality, how does the expectation of a random variable ($X$) given another one ($Y$) is equal the expectation of that same random variable ($X$) given the $\sigma$-algebra generated from this another one ($Y$)?

Best Answer

For questions of this nature, I generally like to gain intuition by assuming $X$ an $Y$ are discrete random variables. So let us assume $X$ and $Y$ take values in $\{1,\dots,n\}$, and keep in mind that they are not necessarily independent. For simplicity, assume $P(Y=y) >0$ for all $y$.

If you think back to undergraduate level probability classes on conditioning, you can probably guess that $E\left[X|Y=y\right]$ should intuitively be the expected value of the random variable $X$ if you know that $Y=y$. So the computation should go something like this:

\begin{align*} E[X|Y=y] &= \sum_{x=1}^n xP(X=x|Y=y) = \sum_{x=1}^n\frac{xP(X=x,Y=y)}{P(Y=y)} \end{align*}

Now let's recall the actual definition of conditional expectation. Now, $\sigma(X,Y)$ is the sigma algebra defining $(X,Y)$. Going back to our discrete case, this is given by $\sigma(X,Y) = \left\{\{(X,Y) \in A\}: A \in \{1,\dots,n\}^2\right\}$. This is just the set of all possible things you can say about $X$ and $Y$ (for example, the set $\{X=2\} = \{(X,Y) \in A := \{(2,i): i \in \{1,\dots,n\}\}$). Fix any sigma algebra $\mathcal{F} \subseteq \sigma(X,Y)$. Then $E[X|\mathcal{F}]$ is an $\mathcal{F}$-measurable random variable satisfying certain conditions that we'll get to later.

What does it mean for a random variable $Z$ to be $\mathcal{F}$-measurable? Well, recall that $Z$ is a function from the probability space to the real numbers. We can explicitly write $Z = Z(x,y) \in \mathbb{R}$ for $x,y \in \{1,\dots,n\}$. Then $\mathcal{F} \supseteq \{Z^{-1}(A): A \in \mathcal{B}(\mathbb{R})\}$. This means that every possible outcome of $Z$ can be completely described in terms of a set in $\mathcal{F}$.

Here are some examples. Let $\mathcal{F} = \sigma(X=1)$. Then $Z(x,y) = \mathbb{I}_{X=1}$ is $\mathcal{F}$-measurable, because no matter what you tell me about the output of $Z$, I can find a set in $\mathcal{F}$ that describes what's going on. For example, $Z < 1$ corresponds to $X\neq 1$, and $Z = 2$ corresponds with the empty set. Both are in $\mathcal{F}$. However, $Z(x,y) = y$ is not $\mathcal{F}$-measurable. If you tell me $Z=3$, well that could be part of $\{X=1\}$ or $\{X\neq 1\}$, but it's not equal to either set. Nor is it equal to $\{(X,Y) \in \{1,\dots,n\}^2\}$ or $\emptyset$.

$E[X|\mathcal{F}]$ is the unique $\mathcal{F}$-measurable random variable whose expectation on any $\mathcal{F}$-measurable set is the same as the expectation of $X$ on that set. Intuitively, since $\mathcal{F}$ is coarser than $\sigma(X,Y)$, you can think of $E[X|\mathcal{F}]$ as the best estimate of $X$ if you have some information about $(X,Y)$ which is described by $\mathcal{F}$. If $\mathcal{F}$ is the trivial sigma algebra, then $E[X|\mathcal{F}] = E[X]$ as that's your best guess if you only know the distribution of $(X,Y)$ and nothing else.

We can write out this definition as follows. For any $A \in \mathcal{F}$,

\begin{align*} E\left[\mathbb{I}_A E[X|\mathcal{F}]\right] &= \sum_{(x,y) \in A} P(X=x,Y=y)E[X|\mathcal{F}](x,y) = E\left[\mathbb{I}_A X\right] = \sum_{(x,y) \in A} xP(X=x,Y=y) \end{align*}

So now let's consider your original question. Let $\mathcal{F} = \sigma(Y)$. Then $\mathcal{F} = \{\{Y\in A\}: A \subseteq \{1,\dots,n\}\}$. Since $E[X|\sigma(Y)]$ is $\sigma(Y)$-measurable, it cannot depend on $x$. So, we can write $E[X|\sigma(Y)](x,y) = E[X|\sigma(Y)](y)$ for all $(x,y) \in\{1,\dots,n\}^2$. Let $A = \{y\}$ for some $y \in \{1,\dots,n\}$:

\begin{align*} \sum_{x=1}^n\sum_{y\in A}P(X=x,Y=y)E[X|\sigma(Y)](y) &= P(Y=y)E[X|\sigma(Y)](y)\\ & = \sum_{x=1}^n\sum_{y \in A} xP(X=x,Y=y) = \sum_{x=1}^n xP(X=x,Y=y) \end{align*}

Then,

$$E[X|\sigma(Y)](y) = \frac{\sum_{x=1}^n xP(X=x,Y=y)}{P(Y=y)} = E[X|Y=y]$$

So $E[X|\sigma(Y)](y) = E[X|Y=y]$. That's why we use this shorthand notation. You can extend this argument to more general random variables and even stochastic processes, but that involves many other tedious details. The main gist is the same. For any two arbitrary random variables, you can write $E[X|Y] = E[X|\sigma(Y)]$ with the intuitive idea that when $Y=y$, $E[X|\sigma(Y)] = E[X|Y=y]$. Hope that makes sense!