Probability Theory – Confusion About Existence Proof of Regular Conditional Probabilities

conditional probabilitymeasure-theoryprobability theoryproof-explanation

I'm reading a proof of Theorem 2.29 below from this note. First, we recall a definition and a lemma, i.e.,

Definition 2.28. Let $(\Omega, \mathcal{F}, \mathbb{P})$ be a probability space, $(T, \mathcal{B})$ a measure space, $F: \Omega \rightarrow T$ a measurable map, and $\mathcal{C} \subseteq \mathcal{F}$ a sub- $\sigma$-algebra. A (regular) conditional probability distribution for $F$ given $\mathcal{C}$ is a function $P_{F \mid \mathcal{C}}: \mathcal{B} \times \Omega \rightarrow[0,1]$ such that

(a) $B \mapsto P_{F \mid \mathcal{C}}(B, \omega)$ is a probability measure on $\mathcal{B}$ for $\mathbb{P}$-almost every $\omega \in \Omega$
(b) For every $B \in \mathcal{B}$ the $\operatorname{map} \omega \mapsto P_{F \mid \mathcal{C}}(B, \omega)$ is $\mathcal{C}$-measurable, and $P_{F \mid \mathcal{C}}(B, \omega)=\mathbb{E}\left[1_{\{F \in B\}} | \mathcal{C}\right](\omega)$ for $\mathbb{P}$-almost every $\omega \in \Omega$.

Theorem 2.29 (Existence of regular conditional probabilities). If $T$ is a separable complete metric space, $\mathcal{B}$ its Borel $\sigma$-algebra, $(\Omega, \mathcal{F}, \mathbb{P})$ a probability space, $F: \Omega \rightarrow T$ a measurable map, and $\mathcal{C}$ a sub- $\sigma$-algebra of $\mathcal{F}$, then there exists a regular conditional probability distribution $P_{F \mid \mathcal{C}}$ for $F$ given $\mathcal{C}$ on $\mathcal{B} \times \Omega$. It is unique in the following sense: if $P^{\prime}$ is another regular conditional probability distribution for $F$ given $\mathcal{C}$, then for $\mathbb{P}$-a.e. $\omega \in \Omega$ we have
$$
P^{\prime}(B, \omega)=P_{F \mid \mathcal{C}}(B, \omega) \text { for all } B \in \mathcal{B} .
$$

Lemma 2.30. If $\mathcal{V}$ and $\mathcal{D}$ are two algebras of subsets of a separable complete metric space $T, \mathcal{V} \subseteq \mathcal{D}$ and $\mu: \mathcal{D} \rightarrow[0, \infty)$ is (finitely) additive and for every $B \in \mathcal{V}$ we have
$$
\mu(B)=\sup \{\mu(B): K \subseteq B, K \in \mathcal{D}, K \text { compact }\}
$$
then $\mu$ is $\sigma$-additive on $\mathcal{V}$.

My question:

For a proof, notice that $\emptyset \in \mathcal{V} \subseteq \mathcal{E}$ and that for $B \in \mathcal{E}$ we have that $\color{blue}{\omega \mapsto \mu_\omega(T \backslash B)=1-\mu_\omega(B)}$…

We know that $\mu_\omega(T \backslash B)=1-\mu_\omega(B)$ follows when $\mu_\omega$ is a probability measure on $T$. However, $\mu_\omega$ is defined only for $\omega \in \Omega \setminus W$, but not on the whole $\Omega$. Could you elaborate on how the author obtains $\color{blue}{\omega \mapsto \mu_\omega(T \backslash B)=1-\mu_\omega(B)}$?

In the proof that $\mathcal E$ is a $\sigma$-algebra, the author said

Further, if $B_1, B_2, \ldots \in \mathcal{E}$ are $\color{blue}{\text{disjoint}}$…

It seems to me we have to prove for the union of arbitrary sequence in $\mathcal E$, not just disjoint ones.

Proof of Theorem 2.29. Choose a countable dense subset $\left\{t_1, t_2, \ldots\right\}$ of $T$. Let
$$
\mathcal{U}:=\left\{B_r\left(t_k\right): k \in \mathbb{N}, r \in \mathbb{Q}, r \geq 0\right\},
$$
where $B_r(t)$ denotes the open ball in $T$ with center $t$ and radius $r$. The collection $\mathcal{U}$ is countable and generates the $\sigma$-algebra $\mathcal{B}$. Let $\mathcal{V}$ be the algebra generated by $\mathcal{U}$. Then also $\mathcal{V}$ is countable. (Indeed, there are finite $\mathcal{U}_1 \subseteq \mathcal{U}_2 \subseteq \cdots$ such that $\mathcal{U}=\bigcup_{k=1}^{\infty} \mathcal{U}_k$. The algebra $\mathcal{V}_k$ genereted by $\mathcal{U}_k$ is also finite and $\bigcup_{k=1}^{\infty} \mathcal{V}_k$ is an algebra and equals $\mathcal{V}$. Hence $\mathcal{V}$ is countable.) The image measure $\mu_F=F_{\#} \mathbb{P}$ of $\mathbb{P}$ under $F$ is a Borel probability measure on $T$ and since $T$ is separable and complete $\mu_F$ is tight. Hence for every $B \in \mathcal{V}$ we can choose a sequence $B_1 \subseteq B_2 \subseteq B_3 \subseteq \cdots$ of compact sets in $T$ with $B_k \subseteq B$ for every $k$ such that $\mu_F(B)=\lim _{k \rightarrow \infty} \mu_F\left(B_k\right)$. Then the functions $1_{\left\{F \in B_k\right\}}$ increase in $k$ and converge $\mathbb{P}$-a.e. to $1_{\{F \in B\}}$. Due to mononote convergence for conditional expectations,
$$
\mathbb{E}\left[1_{\left\{F \in B_k\right\}} | \mathcal{C}\right] \rightarrow \mathbb{E}\left[1_{\{F \in B\}} | \mathcal{C}\right]
$$

Let $\mathcal{D}$ be the algebra of subsets of $T$ generated by $\mathcal{V}$ and by the compact sets of each of the sequences $B_1 \subseteq B_2 \subseteq$ that have chosen above for each $B \in \mathcal{V}$. Then $\mathcal{D}$ is countable. For each set $D \in \mathcal{D}$ the conditional expectation $\mathbb{E}\left[1_{\{F \in D\}} | \mathcal{C}\right]$ is determined up to $\mathbb{P}$-a.e. equality. Let us fix for each $D \in \mathcal{D}$ a particular choice of $\mathbb{E}\left[1_{\{F \in D\}} | \mathcal{C}\right]$ on $\Omega$ and define
$$
P_{f \mid \mathcal{C}}(D, \omega):=\mathbb{E}\left[1_{\{F \in D\}} | \mathcal{C}\right], \omega \in \Omega .
$$
We claim that there exists a subset $W \in \mathcal{F}$ with $\mathbb{P}(W)=0$ such that

(1) For every $D \in \mathcal{D}, \omega \mapsto P_{F \mid \mathcal{C}}(D, \omega)$ is $\mathcal{C}$-measurable;
(2) For every $D \in \mathcal{D}, P_{F \mid \mathcal{C}}(D, \omega) \geq 0$ for all $\omega \in \Omega \backslash W$;
(3) $P_{F \mid \mathcal{C}}(T, \omega)=1$ and $P_{F \mid \mathcal{C}}(\emptyset, \omega)=0$ for all $\omega \in \Omega \backslash W$;
(4) For every $D_1, \ldots, D_n$ in $\mathcal{D}$ disjoint,
$$
P_{F \mid \mathcal{C}}\left(D_1 \cup \cdots \cup D_n, \omega\right)=\sum_{j=1}^n P_{F \mid \mathcal{C}}\left(D_j, \omega\right) \text { for all } \omega \in \Omega \backslash W
$$
(5) For each $B \in \mathcal{V}$ with the sequence $\left(B_j\right)$ corresponding to $B$ as chosen above,
$$
P_{F \mid \mathcal{C}}(B, \omega)=\lim _{j \rightarrow \infty} \mathbb{E}\left[1_{\left\{F \in B_j\right\}} | \mathcal{C}\right](\omega) \text { for all } \omega \in \Omega \backslash W.
$$

Indeed, the functions $\omega \mapsto P_{F \mid \mathcal{C}}(D, \omega), D \in \mathcal{D}$, defined above satisfy all these properties if "for all $\omega \in \Omega \backslash W$" is replaced by "for $\mathbb{P}$-almost every $\omega$ in $\Omega$". Since there are only countably many sets $D$ in $\mathcal{D}$, we can take $W$ to be the union of all the exception sets in the "$\mathbb{P}$-almost everywhere" relations. Then $\mathbb{P}(W)=0$ and (1)-(5) hold.

Next we extend the definition of $P_{F \mid \mathcal{C}}(B, \omega)$ to all $B \in \mathcal{B}$ and show it to be a probability measure. Because of (2), (3), and (4), the map $D \mapsto P_{F \mid \mathcal{C}}(D, \omega)$ is additive and positive. Because of the lemma and property (5), for every $\color{blue}{\omega \in \Omega \backslash W}$ the map $D \mapsto P_{F \mid \mathcal{C}}(D, \omega)$ is $\sigma$-additive on the algebra $\mathcal{V}$. By the Carathéodory extension theorem, it extends to a $\sigma$-additive measure $\mu_\omega$ on the $\sigma$-algebra generated by $\mathcal{V}$, which is $\mathcal{B}$ as $\mathcal{B} \supseteq \mathcal{V} \supseteq \mathcal{U}$ and $\mathcal{U}$ generates $\mathcal{B}$. Clearly $\mu_\omega(T)=1$, by (3). We show that $(B, \omega) \mapsto \mu_\omega(B)$ has the desired properties of the regular conditional expectation. Let
$$
\mathcal{E}:= \left \{B \in \mathcal{B} \,\middle\vert\,
\begin{align*}
&\omega \mapsto \mu_\omega(B) \text{ is }\mathcal{C}\text{-measurable, and}\\
&\mu_\omega(B)=\mathbb{E}\left[1_{\{F \in B\}} | \mathcal{C}\right](\omega) \text { for } \mathbb{P} \text {-a.e. } \omega \in \Omega
\end{align*}
\right \}.
$$

Then $\mathcal{E} \supseteq \mathcal{V}$ since $\mu_\omega(B)=P_{F \mid \mathcal{C}}(B, \omega)=\mathbb{E}\left[1_{\{F \in B\}} \mid \mathcal{C}\right]$ for $B \in \mathcal{V}$ and we have (1). Also, $\mathcal{E}$ is a $\sigma$-algebra. For a proof, notice that $\emptyset \in \mathcal{V} \subseteq \mathcal{E}$ and that for $B \in \mathcal{E}$ we have that $\color{blue}{\omega \mapsto \mu_\omega(T \backslash B)=1-\mu_\omega(B)}$ is $\mathcal{C}$-measurable and
$$
\begin{aligned}
\mu_\omega(T \backslash B) &=1-\mu_\omega(B)=1-\mathbb{E}\left[1_{\{F \in B\}} | \mathcal{C}\right](\omega) \\
&=\mathbb{E}\left[1_{\Omega}-1_{\{F \in B\}} | \mathcal{C}\right](\omega)=\mathbb{E}\left[1_{\Omega \backslash\{F \in B\}} | \mathcal{C}\right](\omega) \\
&=\mathbb{E}\left[1_{\{F \in T \backslash B\}} | \mathcal{C}\right](\omega) \text { a.e. } \omega \in \Omega,
\end{aligned}
$$
so that $T \backslash B \in \mathcal{E}$. Further, if $B_1, B_2, \ldots \in \mathcal{E}$ are $\color{blue}{\text{disjoint}}$, $B=\bigcup_{k=1}^{\infty} B_k$, then $\mu_\omega(B)=$ $\sum_{k=1}^{\infty} \mu_\omega\left(B_k\right)$ for almost every $\omega \in \Omega$, so $\omega \rightarrow \mu_\omega(B)$ is $\mathcal{C}$-measurable, and by the monotone convergence theorem for conditional expectations,
$$
\begin{aligned}
\mu_\omega(B) &=\sum_{k=1}^{\infty} \mathbb{E}\left[1_{\left\{F \in B_k\right\}} | \mathcal{C}\right](\omega)=\mathbb{E}\left[\sum_{k=1}^{\infty} 1_{\left\{F \in B_k\right\}} | \mathcal{C}\right](\omega) \\
&=\mathbb{E}\left[1_{\bigcup_{k=1}^{\infty}\left\{F \in B_k\right\}} | \mathcal{C}\right](\omega)=\mathbb{E}\left[1_{\{F \in B\}} | \mathcal{C}\right](\omega) \text { a.e. } \omega \in \Omega,
\end{aligned}
$$
so $B \in \mathcal{E}$. Hence $\mathcal{E}$ is a $\sigma$-algebra. Since $\mathcal{B}$ is the smallest $\sigma$-algebra containing $\mathcal{U}$ and $\mathcal{E}$ is a $\sigma$-algebra containing $\mathcal{V} \supseteq \mathcal{U}$, we conclude that $\mathcal{E} \supseteq \mathcal{B}$. Hence every $B \in \mathcal{B}$ satisfies the two properties in the definition of $\mathcal{E}$, which means that
$$
(B, \omega) \mapsto P_{F \mid \mathcal{C}}(B, \omega):=\mu_\omega(B)
$$
is a regular conditional probability distribution for $F$ given $\mathcal{C}$.

To see the uniqueness, we use that $P^{\prime}(B, \omega)=\mathbb{E}\left[1_{\{F \in B\}} | \mathcal{C}\right](\omega)$ for almost every $\omega \in \Omega$ and every $B \in \mathcal{B}$. Since $\mathcal{V}$ is countable we can combine the exception sets for $B \in \mathcal{V}$ and obtain a $W^{\prime} \in \mathcal{F}$ with $\mathbb{P}\left(W^{\prime}\right)=0$ such that $P^{\prime}(B, \omega)=P_{F \mid \mathcal{C}}(B, \omega)$ for every $\omega \in \Omega \backslash\left(W \cup W^{\prime}\right)$ for every $B \in \mathcal{V}$. Now fix $\omega \in \Omega \backslash\left(W \cup W^{\prime}\right)$. Since $P^{\prime}(\cdot, \omega)$ and $P_{F \mid \mathcal{C}}(\cdot, \omega)$ are both probability measures on $\mathcal{B}$ and $\mathcal{V}$ is an algebra generating $\mathcal{B}$ on which they coincide, they must be equal on $\mathcal{B}$ (by a uniqueness theorem related to Carathéodory's extension). Hence for $\mathbb{P}$-almost every $\omega \in \Omega$ we have
$$
P^{\prime}(B, \omega)=P_{F \mid \mathcal{C}}(B, \omega) \text { for all } B \in \mathcal{B} \text {. }
$$

Best Answer

My confusions arise because the author implicitly extends the map $(B, \omega) \mapsto \mu_\omega(B)$ and uses Dynkin's $\pi$-$\lambda$ theorem. Below is my re-work of the author's proof to make them clear.

Existence.

We adapt the convention that $\mathbb N := \{1, 2, \ldots\}$. Let $\{t_1, t_2, \ldots\}$ be a countable dense subset of $T$. Let $\mathcal U := \{B(t_k, r) \mid k \in \mathbb N, r \in \mathbb Q_{>0}\}$. Then $\mathcal U$ is countable. Because $T$ is separable, $\sigma (\mathcal U) = \mathcal B$. Let $\mathcal V$ be the algebra generated by $\mathcal U$. Then $\mathcal V$ is countable. Let $\mu_F := F_\sharp \mathbb P$. Then $\mu_F$ is a Borel probability measure on $T$. Because $T$ is Polish, $\mu_F$ is tight. For each $B \in \mathcal V$, there is an increasing sequence $(B_k)$ of compact subsets of $B$ such that $\mu_F (B) = \lim_k \mu_F (B_k)$. Then the sequence $(1_{\{F \in B_k\}})_k$ is increasing (pointwise) and converges $\mathbb P$-a.e. to $1_{\{F \in B\}}$. By monotone convergence for conditional expectation, we have $$ \mathbb E [1_{\{F \in B_k\}} | \mathcal C] \to \mathbb E [1_{\{F \in B\}} | \mathcal C] \quad \mathbb P \text{-a.e.} $$

Let $\mathcal D$ be the algebra generated by $\mathcal V$ and by the compact sets in each of the sequences $(B_k)$ that have been chosen above for each $B \in \mathcal V$. For each $D \in \mathcal D$, we pick a particular choice of $\mathbb E [1_{\{F \in D\}} | \mathcal C]$ on $\Omega$, and denote it by $\mathbb E_c [1_{\{F \in D\}} | \mathcal C]$. We define $$ P_{F|\mathcal C} (D, \omega) := \mathbb E_c [1_{\{F \in D\}} | \mathcal C] (\omega) \quad \forall \omega \in \Omega. $$

Clearly, $\omega \mapsto P_{F|\mathcal C} (D, \omega)$ is measurable for every $D \in \mathcal D$. Because $\mathcal D$ is countable, there is a $\mathbb P$-null set $N \in \mathcal F$ such that

$P_{F|\mathcal C} (D, \omega) \ge 0$ for every $(D, \omega) \in \mathcal D \times N^c$.
$P_{F|\mathcal C} (T, \omega) = 1$ and $P_{F|\mathcal C} (\emptyset, \omega) = 0$ for every $\omega \in N^c$.
For every pairwise disjoint finite sequence $(D_i)_{i=1}^n \subset \mathcal D$, $$ P_{F|\mathcal C} \bigg ( \bigcup_{i=1}^n D_i, \omega \bigg) = \sum_{i=1}^n P_{F|\mathcal C} (D_i, \omega) \quad \forall \omega \in N^c. $$
For each $B \in \mathcal V$ with the sequence $(B_k)$ corresponding to $B$ chosen above, $$ P_{F|\mathcal C} (B, \omega) = \lim_{k \to \infty} P_{F|\mathcal C} (B_k, \omega) \quad \forall \omega \in N^c. $$

Here $N^c :=\Omega \setminus N$. In particular, we use the fact that the set of all finite $\mathbb N$-valued sequences is countable to prove (3.) above.

For each $\omega \in N^c$, the map $D \mapsto P_{F|\mathcal C} (D, \omega)$ is finitely additive, finite, and positive on the algebra $\mathcal D$. By our Lemma, it is $\sigma$-additive on the algebra $\mathcal V$. By Carathéodory's extension theorem, it extends to a $\sigma$-additive measure $\mu_\omega$ on $\sigma(\mathcal V) =\mathcal B$. It follows from $\mu_\omega (T)=1$ that $\mu_\omega$ is a Borel probability measure on $T$.

Let's fix some $\omega_0 \in N^c$. We define a map $\nu:\mathcal B \times \Omega \to [0, 1]$ by $$ \nu (B, \omega) := \begin{cases} \mu_\omega (B) & \text{if} \quad \omega \in N^c\\ \mu_{\omega_0} (B) & \text{if} \quad \omega \in N. \end{cases} $$

Let's show that $\nu$ satisfies our requirement. Clearly, $B \mapsto \nu (B, \omega)$ is a Borel probability measure $\mathbb P$-a.e. Let $$ \mathcal E := \left \{B \in \mathcal B \,\middle\vert\, \begin{align*} & \nu (B, \cdot) \text{ is }\mathcal C \text{-measurable, and} \\ &\nu(B, \cdot) = \mathbb E [1_{\{F \in B\}} | \mathcal C] \quad\mathbb P \text{-a.e.} \end{align*} \right\} $$

Notice that $\nu(B, \cdot) = \mathbb E [1_{\{F \in B\}} | \mathcal C]$ $\mathbb P$-a.e. does not necessarily imply $\nu (B, \cdot)$ is $\mathcal C$-measurable. Let's prove that $\mathcal V \subset \mathcal E$. Fix $B \in \mathcal V$. Then $$ \nu (B, \omega) = \begin{cases} P_{F|\mathcal C} (B, \omega) & \text{if} \quad \omega \in N^c\\ P_{F|\mathcal C} (B, \omega_0) & \text{if} \quad \omega \in N. \end{cases} $$

This means $$ \nu (B, \cdot) = 1_{N^c} (\cdot) P_{F|\mathcal C} (B, \cdot) + 1_{N} (\cdot) P_{F|\mathcal C} (B, \omega_0). $$

Because $P_{F|\mathcal C} (B, \cdot), 1_{N^c} (\cdot), 1_{N} (\cdot)$ are $\mathcal C$-measurable, we get $\nu (B, \cdot)$ is $\mathcal C$-measurable. Because $P_{F|\mathcal C} (B, \cdot) = \mathbb E [1_{\{F \in B\}} | \mathcal C]$ $\mathbb P$-a.e. and $N$ is a $\mathbb P$-null set, $\nu(B, \cdot) = \mathbb E [1_{\{F \in B\}} | \mathcal C]$ $\mathbb P$-a.e.

Let's prove that $\mathcal E$ a $\lambda$-system. Clearly, $\emptyset \in \mathcal V \subset \mathcal E$. Fix $B \in \mathcal E$. Then $$ \nu (B^c, \omega) = \begin{cases} \mu_\omega (B^c) & \text{if} \quad \omega \in N^c\\ \mu_{\omega_0} (B^c) & \text{if} \quad \omega \in N. \end{cases} = \begin{cases} 1-\mu_\omega (B) & \text{if} \quad \omega \in N^c\\ 1-\mu_{\omega_0} (B) & \text{if} \quad \omega \in N. \end{cases} =1-\nu (B, \omega). $$

It follows that $\nu(B^c, \cdot)$ is $\mathcal C$-measurable. Also, $$ \nu(B^c, \cdot) = 1- \nu(B, \cdot) = 1- \mathbb E [1_{\{F \in B\}} | \mathcal C] = \mathbb E [1_\Omega -1_{\{F \in B\}} | \mathcal C] = \mathbb E [1_{\{F \in B^c\}} | \mathcal C] \quad\mathbb P \text{-a.e.} $$

This implies $B^c \in \mathcal E$. Let $(B_k) \subset \mathcal E$ be a sequence of pairwise disjoint sets. We will prove that $B := \bigcup_k B_k \in \mathcal E$. We have $$ \nu (B, \omega) = \begin{cases} \mu_\omega (\bigcup_k B_k) & \text{if} \quad \omega \in N^c\\ \mu_{\omega_0} (\bigcup_k B_k) & \text{if} \quad \omega \in N. \end{cases} = \begin{cases} \sum_k \mu_\omega (B_k) & \text{if} \quad \omega \in N^c\\ \sum_k \mu_{\omega_0} (B_k) & \text{if} \quad \omega \in N. \end{cases} = \sum_k \nu (B_k, \omega) $$

It follows that $\nu(B, \cdot)$ is $\mathcal C$-measurable. By monotone convergence for conditional expectation, we get $$ \nu(B, \cdot) = \sum_k \nu (B_k, \cdot) = \sum_k \mathbb E [1_{\{F \in B_k\}} | \mathcal C] = \mathbb E \left [ \sum_k 1_{\{F \in B_k\}} \,\middle\vert\, \mathcal C \right ] = \mathbb E \left [ 1_{\{F \in B\}} \,\middle\vert\, \mathcal C \right ] \quad\mathbb P \text{-a.e.} $$

By Dynkin's $\pi$-$\lambda$ theorem, we get $\sigma(\mathcal V) \subset \mathcal E$. As such, $\mathcal E = \mathcal B$.

Uniqueness.

Assume $\nu:\mathcal B \times \Omega \to [0, 1]$ is another map that satisfies the requirement. Then for all $B \in \mathcal V$, $$ \nu(B, \cdot) = \mathbb E [1_{\{F \in B\}} | \mathcal C] = \nu'(B, \cdot) \quad \mathbb P \text{-a.e.} $$

Because $\mathcal V$ is countable, there is a $\mathbb P$-null set $M \in \mathcal F$ such that $\nu(B, \omega) = \nu' (B, \omega)$ for all $(B, \omega) \in \mathcal V \times M^c$. By the uniqueness part of Carathéodory's extension theorem, we have $\nu(B, \omega) = \nu' (B, \omega)$ for all $(B, \omega) \in \mathcal B \times M^c$. This completes the proof.

Best Answer

Related Solutions

Probability Theory – From Conditional Probability to Conditional Expectation

[Math] Existence of regular conditional distribution of random variable given the value of another variable

Related Question