Data-processing inequality for markov chains with non-discrete statespace

entropyinformation theorymarkov chainsreference-request

Is the data-processing inequality true for markov chains with non-discrete statespace?

Other answers to this question on this site, argue that it is sufficient to prove that further conditioning reduces differential entropy. Or equivalently that the conditional mutual information is always positive:
$$
h(X| Z) \geq h(X|Y,Z) \iff I(X;Y|Z) \geq 0.
$$

I seem to have a proof that the inequality holds, but the absence of this inequality in the literature on information theory makes me question my proof. I have only seen monographs that show this inequality for discrete random variables. The data-processing inequality is likewise only proven for discrete random variables. Furthermore, on the wikipedia page on Conditional mutual information the non-negativity is stated explicitly for "discrete, jointly distributed random variables $X,Y$ and $Z$.

Does the non-negativity hold, and if so do you know a reference (of either the data-processing inequality or that conditional mutual information is positive)?

Best Answer

(Turned comment into answer)


Non-negativity of conditional mutual information holds quite generally, because conditional mutual information is an average of mutual informations, and each of those is non-negative due to the non-negativity of KL divergence. Essentially, $$ I(X;Y|Z)= \int I(X;Y|Z=z)P(\mathrm{d}z),$$ where $I(X;Y|Z=z)=D(P_{XY|Z=z} \| P_{X|Z=z} \circ P_{Y|Z=z})$, and so CMI is integrating a non-negative function wrt a non-negative measure.

Technical note - For this argument to go through, we need the decomposition I wrote to make sense. One sufficient condition for this is to have enough structure to get 'nice enough' disintegration kernels. This works under very weak conditions, e.g. that the measurable spaces are all Polish. Unless you're working in the maths of Inf Th or working in some very rich function space etc, you don't need to worry about these things. See Chapter 2 of these notes to get a sense of these technical issues and some of their resolution.

Related Question