[Math] Fano’s Inequality Proof

entropyinformation theorymarkov chains

For an information theory class, I am studying the proof for Fano's inequality, i.e.:

$H(P_e) + P_elog(|X|) \geq H(X|\hat{X}) \geq H(X|Y)$

Where $H(X)$ is the entropy of the random variable X $\hat{X}$ is an estimate of the random variable X, and $P_e$ is the probability of getting it wrong. Also,

$X \rightarrow Y \rightarrow \hat{X} $ is a Markov chain.

I am following the proof from Cover and Thomas' Elements of Information Theory.

The first step is to define a random variable E:

$ E = 1 $ if $\hat{X} \neq X$

$ E = 0 $ if $\hat{X} = X$

Then these two equations I cannot explain are written:

$H(E,X|\hat{X}) = H(X|\hat{X}) + H(E|X, \hat{X})$;

$H(E,X|\hat{X}) = H(E|\hat{X}) + H(X|E,\hat{X})$.

The rest of the proof relies on these equations, but I do not understand how they are found; I imagine it is a simple application of the entropy chain rule and of the rule for conditional entropy, but I can't derive them from the general equations.

Could I get the step-by-step derivation of both, starting from $H(E,X|\hat{X})$? Thank you kindly!

Best Answer

Yes, it is just the chain rule for entropy.

The chain rule for entropy is Theorem 2.2.1 in Cover and Thomas. $$H(X,Y) = H(X) + H(Y|X)$$ You can use the chain rule when you condition on another random variable $Z$. In this case, you get $$H(X,Y|Z) = H(X|Z) + H(Y|X,Z)$$ The proof is word for word the same as the proof of the original chain rule. In this case, write each of the probabilities conditioned on $z$. For example, write $p(x,y|z)$ instead of $p(x,y)$.

Or, alternatively, you can write something like this (as they suggest in Eqn. (2.20)): \begin{align*} H(X,Y|Z) &= -E[\log p(X,Y|Z)] \\ &= -E[\log p(X|Z)] - E[\log p(Y|X,Z)] \\ &= H(X|Z) + H(Y|X,Z) \end{align*}

With this version of the chain rule for conditional entropies, you get the two equations that you wanted by applying it to the random variables $X, \hat{X}, E$.

Related Question