[Math] Information Theory – Data Processing Inequality

information theory

The data processing inequality theorem states that given a markov chain $W \rightarrow X \rightarrow Y $, then $ I(W;X) \ge I(W;Y) $

Does this also mean that for 4 variables $W \rightarrow X \rightarrow Y \rightarrow Z$, then $I(W;X) \ge I(W;Y) \ge I(W;Z)$ ? I was trying to prove it but I got stuck and was hoping to get some help. Below are my expansion of the equations.

$ I(W;X,Y,Z) \\ = I(W;X) + I(W;Y|X) + I(W;Z|X,Y) \\ = I(W;Y) + I(W;X|Y) + I(W;Z|X,Y) \\ = I(W;Z) + I(W;X|Z) + I(W;Y|X,Z) $

I can then cancel out some terms due to conditional independence. It thus looks like

$ \require{cancel} I(W;X,Y,Z) \\ = I(W;X) + \cancel{I(W;Y|X)} + \cancel{I(W;Z|X,Y)} \\ = I(W;Y) + I(W;X|Y) + \cancel{I(W;Z|X,Y)} \\ = I(W;Z) + I(W;X|Z) + \cancel{I(W;Y|X,Z)} \\ $

The second and third line is clear enough for me since if $I(W;X) = I(W;Y) + I(W;X|Y)$, then $I(W;X) \ge I(W;Y)$. But im not sure of how I can show that $I(W;Y) \ge I(W;Z)$.

Best Answer

I think the comment under the question suffices. But if you insist on seeing what happens with your inequality, you need to prove: $$ I(W;X|Y)\leq I(W;X|Z). $$ See $I(W;X|Y)=H(W|Y)-H(W|X,Y)=H(W|Y)-H(W|X)$ (given the Markov relation). So the inequality boils down to: $$ H(W|Y)\leq H(W|Z). $$ which is proved using $H(W|Z)\geq H(W|YZ)=H(W|Y)$.

Related Question