For an information theory class, I am studying the proof for Fano's inequality, i.e.:
$H(P_e) + P_elog(|X|) \geq H(X|\hat{X}) \geq H(X|Y)$
Where $H(X)$ is the entropy of the random variable X $\hat{X}$ is an estimate of the random variable X, and $P_e$ is the probability of getting it wrong. Also,
$X \rightarrow Y \rightarrow \hat{X} $ is a Markov chain.
I am following the proof from Cover and Thomas' Elements of Information Theory.
The first step is to define a random variable E:
$ E = 1 $ if $\hat{X} \neq X$
$ E = 0 $ if $\hat{X} = X$
Then these two equations I cannot explain are written:
$H(E,X|\hat{X}) = H(X|\hat{X}) + H(E|X, \hat{X})$;
$H(E,X|\hat{X}) = H(E|\hat{X}) + H(X|E,\hat{X})$.
The rest of the proof relies on these equations, but I do not understand how they are found; I imagine it is a simple application of the entropy chain rule and of the rule for conditional entropy, but I can't derive them from the general equations.
Could I get the step-by-step derivation of both, starting from $H(E,X|\hat{X})$? Thank you kindly!
Best Answer
Yes, it is just the chain rule for entropy.
The chain rule for entropy is Theorem 2.2.1 in Cover and Thomas. $$H(X,Y) = H(X) + H(Y|X)$$ You can use the chain rule when you condition on another random variable $Z$. In this case, you get $$H(X,Y|Z) = H(X|Z) + H(Y|X,Z)$$ The proof is word for word the same as the proof of the original chain rule. In this case, write each of the probabilities conditioned on $z$. For example, write $p(x,y|z)$ instead of $p(x,y)$.
Or, alternatively, you can write something like this (as they suggest in Eqn. (2.20)): \begin{align*} H(X,Y|Z) &= -E[\log p(X,Y|Z)] \\ &= -E[\log p(X|Z)] - E[\log p(Y|X,Z)] \\ &= H(X|Z) + H(Y|X,Z) \end{align*}
With this version of the chain rule for conditional entropies, you get the two equations that you wanted by applying it to the random variables $X, \hat{X}, E$.