Question on the use of the Markov Kernel for conditional probability

We define a Markov kernel:

Let $(\Omega_{1},\mathcal{A}_{1})$ and $(\Omega_{2},\mathcal{A}_{2})$ be some measurable spaces. A map $K$ where $K : \Omega_{1}\times \mathcal{A}_{2}\to [0,\infty]$ is called a Markov kernel if,

$1.$ For all $A\in \mathcal{A}_{2}$, the map $$K(\cdot,A):\Omega_{1}\to [0,\infty],\; \omega_{1}\mapsto K(\omega_{1},A)$$ is a $\mathcal{A}_{1}$ measurable.

$2.$ For all $\omega_{1} \in \Omega_{1}$, the map $$K(\omega,\cdot): \mathcal{A}_{2} \to [0,\infty], \; A \mapsto K(\omega,A)$$ is a probability measure.

In our lecture, the Markov kernel was introduced in order to find a "satisfying" expression for the conditional probability $P(A\lvert \mathcal{F})$ where $\mathcal{F}$ is some sub-$\sigma$-field in a probability space $(\Omega,\mathcal{A},P)$. I have to truly understand what "satisfying" means in this sense. The reason given in the lecture is that viewing the conditional probability operator given $\mathcal{F}$ as an operator on $\Omega\times \mathcal{A}$, a continuous model can have uncountably many events $A \in \mathcal{A}$ and it is not clear whether the choice of null-sets remains plaubsible in order to obtain a probability measure in the event argument, since an uncountable union of null sets need not be a null set.

My issue: I do not understand the problem here. Surely, we have the exact same issue in the unconditional case: the uncountable union of null sets need not be a null set. What am I misunderstanding?

Best Answer

There are some problems with the other answer, so let me offer something different.

First of all, contrary to what the other answer says, conditional probabilities do satisfy the measurability constraint (1) on Markov kernels. This is just part of their definition.

The key thing to realize is that, for conditional probabilities, there could be a set $F$ with positive measure such that $P(\cdot \mid \mathcal F)(\omega)$ is not a probability measure for all $\omega \in F$.

How can this happen? Well, what does it mean to be a probability measure? The key property is:

Additivity: $P$ is a probability measure if $P(\cup_{j=1}^n A_j) = \sum_{j=1}^n P(A_j)$ for all pairwise disjoint sequences $A_1,...,A_n$ in $\mathcal A$.

Now, using the definition of conditional probability, it is easy to show that for every pairwise disjoint sequence $\mathscr A = A_1,...,A_n$, there is a null set $F_{\mathscr A}$ such that $P(\cup_{j=1}^n A_j \mid \mathcal F)(\omega) = \sum_{j=1}^n P(A_j \mid \mathcal F)(\omega)$ whenever $\omega \notin F_{\mathscr A}$.

Now ask yourself: How many sequences $\mathscr A$ of pairwise disjoint events are there in a large probability space? It's easy to convince yourself (consider Lebesgue measure, for instance) that in general there must be uncountably many such $\mathscr A$. And that means there are uncountably many sets $F_{\mathscr A}$ on which the conditional probability fails to be a probability measure.

Thus, the set $F$ where the conditional probability fails to be a probability measure includes $$\bigcup_{\mathscr A} F_{\mathscr A},$$ which, although each $F_{\mathscr{A}}$ is null, may have positive measure because it's an uncountable union.

To me, this is the key point that the other answer did not convey.

Best Answer

Related Solutions

Conditional Probability and Regular Conditional Probabilities

What do Markov operators do

Related Question