Transforming Radon Nikodym derivatives

lebesgue-measureradon-nikodym

I am currently confused by the (abuse of?) notation regarding the Radon Nikodym derivative in many proofs.

What I am currently struggling with in particular is a proof of the classical information processing inequality of f-divergences:

Let $P_{X,Y}(E) = \int_{\mathcal{X}}P_{Y|X}(E^x|x)P_X(dx)$ be a measure on $\mathcal{X}\times\mathcal{Y}$ with $E$ being a measureable subset of $\mathcal{X}\times\mathcal{Y}$ and $E^x = \{y:(x,y)\in E\}$. Likewise, let $Q_{X,Y}(E) = \int_{\mathcal{X}}P_{Y|X}(E^x|x)Q_X(dx)$, i.e., both $P_{X,Y}$ and $Q_{X,Y}$ are using the same kernel!

Now I have two questions:

1.) There is this equation:
\begin{align}
\mathbb{E}_{Q_{X,Y}}\biggl[f\biggl(\frac
{dP_{X,Y}}{dQ_{X,Y}}\biggr)\biggr] &= \mathbb{E}_{Q_{X,Y}}\biggl[f\biggl(\frac
{d(\int P_{Y|X}dP_X)}{d(\int P_{Y|X}dQ_X)}\biggr)\biggr]
\end{align}

Here, $f:(0, \infty)\rightarrow \mathbb{R}$ is some convex function.
So this notation seems to imply that you could somehow transform the numerator and denominator of the radon nikodym derivative independently. However, to write it as a fraction is just a notation – in reality it is not a fraction but just a function where you cannot simply plug in something for the numerator and something for the denominator. So what is going on here? How can I proof that statement?

2.) The second question is the claim that
$$\int_{\mathcal{Y}}\frac
{d(\int P_{Y|X}dP_X)}{d(\int Q_{Y|X}dQ_X)} dQ_{Y|X}(dy,x) = \frac{dP_X}{dQ_X}$$

For discrete random variables it would be clear what is going on, i.e., the $Q_{Y|X}$ cancels and then summation over $P_{Y|X}$ is 1 such that only the fraction of the marginals remain. However, with the integrals and in particular the integrals within the Radon Nikodym derivative, I do not understand what is going on.

3.) My third question is, why does the following equation hold?
$$\frac{dP_{X,Y}}{d(P_{X,Y}+Q_{X,Y})}=\frac{dP_{X}}{d(P_{X}+Q_{X})}.$$
Again, since they also use the notation $P_{X,Y}(E) = \int_{\mathcal{X}}P_{Y|X}(E^x|x)P_X(dx):=P_{Y|X}P_X$ (and likewise for $Q_{X, Y}$) they seem to suggest that you can somehow pull out the kernel $P_{Y|X}$ both in the numerator and the denomiator and then cancel them. But first of all there are actually integrals involved and secondly, again as in 1.) this is not really a fraction but just a notation. So what exactly is going on here and how can I proof this?

Best Answer

1.) This appears to just be notation. Just interpret $\int P_{Y \mid X}\,dP_X$ as $P_{X,Y}$.

2.) For this I'll write $$P((X, Y) \in E) = \int 1_E(x, y)P(X \in dx, Y \in dy) = \int 1_E(x, y)P(Y \in dy \mid X = x)P(X \in dx).$$ From this we have $$P(X \in dx, Y \in dy) = P(Y \in dy \mid X = x)P(X \in dx).$$ Such an equality of differentials should be interpreted as saying that the measures on both sides integrate to the same thing against every measurable set. We want to show that $$\int_{Y}\frac{dP}{dQ}(x, y)Q(Y \in dy \mid X = x)Q(X \in dx) = P(X \in dx).$$ But this is clear because for any set $A$, $$\int_{A}\int_{Y}\frac{dP}{dQ}(x, y)Q(Y \in dy \mid X = x)Q(X \in dx) = \int_{A}\int_{Y}\frac{dP}{dQ}(x, y)Q(X \in dx, Y \in dy) = P(X \in A).$$

3.) Here we take the same approach. We want to show that \begin{align} \frac{dP_{X}}{d(P_X + Q_X)}(x)(P(X \in dx, Y \in dy) + Q(X \in dx, Y \in dy)) &= P(X \in dx, Y \in dy). \end{align} This follows from the assumption that the conditional distributions of $Y \mid X$ are the same for both $P$ and $Q$: $$P(X \in dx, Y \in dy) + Q(X \in dx, Y \in dy) = P(Y \in dy \mid X = x)(P(X \in dx) + Q(X \in dx)).$$

Actually, you can use things like $\frac{dP_X}{dQ_X}(x) = \frac{P(X \in dx)}{P(Q \in dx)}$ as long as you interpret this as $\frac{dP_X}{dQ_X}(x)Q(X \in dx) = P(X \in dx)$. Then the cancellations you observed like in the discrete case are made rigorous, and the same proofs work. For example, $\frac{P(X \in dx, Y \in dy)}{Q(X \in dx, Y \in dy)} = \frac{P(X \in dx)}{Q(X \in dx)}$.

Related Question