[Math] Proof of conditional entropy $H( X , Y | Z ) = H( X | Z ) + H( Y | X , Z )$

entropyinformation theoryprobabilityproof-verification

In information theory, the joint entropy $H(X,Y)$ of a pair of discrete random variables $(X,Y)$ is defined as:

$$
H(X,Y) = -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}p(x,y)log_2p(x,y)\tag{1}\label{eq1}
$$

And the conditional entropy $H(Y \mid X)$ of the same $(X,Y)$ as:

$$
H(Y \mid X) = -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}p(x,y)log_2p(y\mid x)\tag{2}\label{eq2}
$$

With the chain rule of probability, the theorem:

$$
H(X,Y) = H(X)+H(Y \mid X)\tag{3}\label{eq3}
$$

is another interpretation of the joint entropy for $(X,Y)$.

A corollary of this theorem is:
$$
H(X,Y \mid Z) = H(X \mid Z)+H(Y \mid X,Z)
$$

Proof:

\begin{align}
H(X,Y \mid Z) &= -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}\sum_{z\in \mathcal Z}p(x,y,z)log_2p(x,y \mid z) \\
&= -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}\sum_{z\in \mathcal Z}p(x,y,z)log_2p(y \mid z,x)p(x\mid z) \\
&= -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}\sum_{z\in \mathcal Z}p(x,y, z)log_2p(x\mid z) -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}\sum_{z\in \mathcal Z}p(x,y,z)log_2p(y \mid z,x)\\
&= -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}\sum_{z\in \mathcal Z}p(y\mid x,z)p(x,z)log_2p(x\mid z) -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}\sum_{z\in \mathcal Z}p(x,y,z)log_2p(y \mid z,x)\\
&= -\sum_{x\in \mathcal X}\sum_{z\in \mathcal Z}p(x,z)log_2p(x\mid z) -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}\sum_{z\in \mathcal Z}p(x,y,z)log_2p(y \mid z,x)\\
&= H(X \mid Z)+H(Y \mid X,Z)
\end{align}

I would like to know if it is right.

Best Answer

Without having looked at the details of your proof, I suggest the following alternative argument.

Instead of seeing $X$ and $Z$ as two distinct discrete random variables, we pair them up and interpret $A := (X,Z)$ as a single random variable. Clearly, all information that is contained in $X$ and $Z$ is also contained in $A$ and vice versa. Similarly, we pair up $X$ and $Y$ by defining $B : = (X,Y)$. We then get by applying rule $(3)$:

$$ H(X \vert Z) + H(Y \vert X,Z) =$$ $$ H(X \vert Z) + H(Y \vert A) =$$ $$ H(X,Z) -H(Z) + H(Y,A) - H(A) =$$ $$ H(X,Z) -H(Z) + H(Y,A) - H(X,Z) =$$ $$ -H(Z) + H(Y,A)=$$ $$ -H(Z) + H(Y,X,Z)=$$ $$ -H(Z) + H(B,Z)=$$ $$ H(B \vert Z) =$$ $$ H(X,Y \vert Z).$$

Related Question