[Math] Mutual Information Chain Rule

entropyinformation theory

I have a (perhaps simple) question about the chain rule for mutual information. The formula is given by

$$I(X_1, X_2, …, X_n; Y) = \sum_{i=1}^{n} I(X_i; Y| X_{i-1}, X_{i-2}, …, X_1)$$

My question is how to use it on the following equation:

$$I(X; Y_1, Y_2) = I(X; Y_1) + I(X; Y_2| Y_1) \hspace{3cm} (1)$$

I just get
$I(X; Y_1, Y_2) = I(Y_1, Y_2; X) = I(X; Y_1) + I(Y_2; X|Y_1)$

With the definition of conditional information and the chain rule for entropy it follows that

$I(X,Y_2|Y_1) = H(X|Y_1) – H(Y_2,X|Y_1) + H(Y_2|Y_1)$

and

$I(Y_2;X|Y_1) = H(Y_2|Y_1) – H(X,Y_2|Y_1) + H(X|Y_1)$

This leads to the equation $H(Y_2, X|Y_1) = H(X, Y_2| Y_1)$, which is obviously not true. Where do I make a mistake? Also, how can i derive euqation (1) directly by the chain rule?
I also followed the proof on wikipedia, which I understood. But I still don't see it from the chain rule for mutual information.

Thank you very much for your help and patience 🙂

Best Answer

The equation $H(Y_2, X|Y_1) = H(X, Y_2|Y_1)$ is true. More generally, you can change around the order of the random variables under the entropy functional without changing its value, which follows informally from the fact that $p_{(X,Y,Z)}(x,y,z) = p_{(Y,X,Z)}(y,x,z)$ and the definition of entropy: $$H(X,Y|Z) = -\sum p_{(X,Y,Z)}(x,y,z) \log \frac{p_{(X,Y,Z)}(x,y,z)}{p_Z(z)} \\ = -\sum p_{(Y,X,Z)}(y,x,z) \log \frac {p_{(Y,X,Z)}(y,x,z)}{p_Z(z)} = H(Y,X|Z).$$

By a very similar argument $I(X;Y_2|Y_1) = I(Y_2;X|Y_1)$ so your application of the chain rule is correct and (1) follows from the work you've already done: $$ I(X;Y_1,Y_2)=I(Y_1,Y_2;X)=I(X;Y_1)+I(Y_2;X|Y_1) = I(X;Y_1)+I(X;Y_2|Y_1).$$

Related Question