What am I writing when I write $\mathbf X \mid \mathbf Y$

conditional-expectationdefinitionnotationprobability theoryrandom variables

Suppose $\mathbf X$ is a random variable and $A$ is an event in the same probability space $(\Omega, \mathcal F, \Pr)$. (Formally, $\mathbf X$ is a function on $\Omega$, say $\Omega \to \mathbb R$; $A$ is a subset of $\Omega$.)

I am comfortable writing $\mathbf X \mid A$ to condition $\mathbf X$ on $A$. This can be defined as another random variable on a different probability space: replace $\Omega$ by $A$, $\mathcal F$ by $\{S \cap A : S \in \mathcal F\}$, and measure $\Pr[\,{\bullet} \mid A]$. Then, just let $\mathbf X \mid A$ have the same value as $\mathbf X$ on every $\omega \in A$. With this definition, the conditional expectation $\mathbb E[\mathbf X \mid A]$ is just the ordinary expectation of this new random variable $\mathbf X \mid A$.

I am less comfortable with a different notation, which is what this question is about:

Suppose $\mathbf X, \mathbf Y$ are two random variables in the same probability space. It might be convenient to describe their joint distribution as "choose $\mathbf Y$, then choose $\mathbf X$ in a way that depends on $\mathbf Y$". For example, we flip $10$ coins and let $\mathbf Y$ be the number of heads; we flip those $\mathbf Y$ coins again and let $\mathbf X$ be the number of heads. We can write this distribution as
$$
\mathbf Y \sim \textit{Binomial}(10, \tfrac12) \qquad \mathbf X \mid \mathbf Y \sim \textit{Binomial}(\mathbf Y, \tfrac12).
$$

This notation has some nice features. If $\mathbf Z \sim \textit{Binomial}(n,\frac12)$ for a constant $n$, then $\mathbb E[\mathbf Z] = \frac12 n$. Here, we can pretend that we're in the same boat and write $\mathbb E[\mathbf X \mid \mathbf Y] = \frac12\mathbf Y$, which is correct as a description of the random variable $\mathbb E[\mathbf X \mid \mathbf Y]$.

But is $\mathbf X \mid \mathbf Y$ really any kind of random variable (or other object) on its own, or is this just abuse of notation?

Note: I will deal with $\int$ symbols if I must, but if I get an answer just for discrete random variables where these don't show up, that's fine by me.

Best Answer

Let me post my comment here (with a little adding of references).

Firstly, some notation. If we have two measurable spaces $(E_1,\mathcal E_1),(E_2,\mathcal E_2)$ and I say that $f:E_1 \to E_2$ is a random variable (or measurable function) without preciselly stating with respect to which sigma fields, then I assume that it's $\mathcal E_2 / \mathcal E_1$ measurable (meaning $f^{-1}[B] \in \mathcal E_1$ for any $B \in \mathcal E_2$)


Definition. Let $(\Omega,\mathcal F,\mathbb P)$ be a probabilistic space, and let $(E,\mathcal E)$ be a measurable space. Consider random variable $X:\Omega \to E$ and some sigma field $\mathcal G \subset \mathcal F$. We say that $\eta:\Omega \times \mathcal E \to \mathbb R$ is a regular conditional distribution of $X$ with respect to $\mathcal G$ iff:

  1. For all $\omega \in \Omega$, function $\eta(\omega,\cdot):\mathcal E \to \mathbb R$ is a probability measure on $(E,\mathcal E)$.

  2. For all $B \in \mathcal E$, function $\eta(\cdot,B):\Omega \to \mathbb R$ is $\mathcal G$ measurable

  3. For all $B \in \mathcal E$, function $\eta(\cdot,B):\Omega \to \mathbb R$ is (a.s) equal to $\mathbb E[1_B(X) | \mathcal G] $


We're interested in case of $\mathcal G = \sigma(Y)$ for some random variable $Y:\Omega \to S$, where $(S,\mathcal S)$ is another measurable space. Note that in such case, $\eta$ is a good candidate to actually make sense of something like $X|Y$ (as far as I know, it's rather uncommon notation). Indeed, we can then identify $\eta:\Omega \times \mathcal E \to \mathbb R$ with $\xi:S\times \mathcal E \to \mathbb R$ in such a manner that $\eta(\omega,B) = \xi(Y(\omega),B)$. In other words, $\xi$ works as $\xi(y,B) = \mathbb P(X \in B | Y=y)$ (if we know how to make sense of the latter (*) - see below), when $y = Y(\omega)$.

But one may asks whether it always exists (we've just defined something in terms of $3$ conditions, so we cannot be sure that it even exists), or if the answer to the preceding question is negative, do we have some assumptions on our spaces/sigma fields to actually prove the existence of regular conditional distribution.

Here, I will state theorem in somehow "weird" way, but it will be easier to make references.


Theorem Assume that $(\Omega,\mathcal F,\mathbb P)$ is a probability space, $(E,\mathcal E)$ is a measurable space, $X:\Omega \to E$ is a random variable and $\mathcal G \subset \mathcal F$ is any $\sigma-$field. If any of those below holds

  1. $(E,\mathcal E) = (\mathbb R, \mathcal B(\mathbb R))$

  2. $E$ is separable, complete metric space (polish space) and $\mathcal E=\mathcal B(E)$ (borel sigma field)

  3. $(E,\mathcal E)$ is borel isomorphic/borel equivalent to $(\mathbb R,\mathcal B(\mathbb R))$ ( see (**) below)

Then regular conditional distribution of $X$ with respect to $\mathcal G$ exists.


Obviously $2$ implies $1$. In fact, $3$ implies $2$ (in my opinion it is really non-trivial), but let us firstly define what we mean by borel isomorphism.

$(**)$ Definition We say that measurable space $(E,\mathcal E)$ is borel isomorphic to $\mathbb R$, if there is a map $f:E \to \mathbb R$ such that

  1. $f(E) \in \mathcal B(\mathbb R)$

  2. $f^{-1}[C] \in \mathcal E$ for any $C \in f(E) \cap \mathcal B(\mathbb R) := \{f(E) \cap B : B \in \mathcal B(\mathbb R)\}$

  3. $f(A) \in f(E) \cap \mathcal B(\mathbb R)$ for any $A \in \mathcal E$.


Having said that, finally some references.

Proof of case 1) can be found (for example) in A.N. Shiryaev book "Probability" (second edition) in chapter II, paragraph 7. The author shows how to prove 3) if we already proved 1) and mentions that a polish space with borel sigma field is actually borel isomorphic to $(\mathbb R,\mathcal B(\mathbb R))$ so somehow Shiryaev's book is complete in terms of theorem I stated.

Worth to mention, that proofs (at least of case 1) can be found in many books about Markov Processes.

If anyone's interested in having regular conditional distribution on polish space with borel sigma field (i.e case 2)), but without this fact about borel isomorphism, then R.M. Dudley in his book "Real analysis and probability", in chapter 10, section 10.2 proves case 2) without reffering to borel isomorphism.

$(*)$ I've written something like $\mathbb E[1_B(X)|Y=y]$ but what does it actually mean? Let's look at $E[1_B(X)|Y]$ firstly. There is a following fact

Fact If $Z$ is a random variable with values in $(\mathbb R,\mathcal B(\mathbb R))$ (or even polish space $(E,\mathcal B(E))$) and $W$ is a random variable with values in any metric space $(S,\mathcal B(S))$, and moreover $Z$ is $\sigma(W)$ measurable, then we have some borel function $h:S \to \mathbb R$ (respectivelly $h:S \to E$) such that $Z=h(W)$.

Proof (at least for $\mathbb R$ case (which is sufficient for us)) goes in a standard way, that is firstly assume $Z=1_B$ for some borel set (then $h=1_{W^{-1}[B]}$ (why it's borel?)), then use linearity to pass to case of $Z$ - simple functions, then limiting procedure to pass to non-negative function (that is, any non-negative random variable can be approximated (increasingly) by sequence of simple functions ( random variables ), and lastly write $Z=Z^+ - Z^-$.

Using this with $Z=\mathbb E[1_B(X)|Y]$, $W=Y$, we see that $\mathbb E[1_B(X)|Y] = h_B(Y)$ for some borel function $h_B$ (depending on set $B$, of course), and notation $\mathbb E[1_B(X)|Y=y]$ means exactly $h_B(y)$.

In other words, in the case when such regular conditional distribution exists, we can treat $X|Y$ as a random measure such that $X|Y(\omega)(B) = h_B(Y(\omega))$, where $h_B$ is a borel function such that $h_B(Y) = \mathbb E[1_B(X)|Y]$.

Related Question