Confusing about when conditioning on event can increase the entropy.

entropyprobability theory

In a discussion, we were trying to figure out if it is possible that entropy increases. My friend came up with the following scenario:

Consider an indicator random variable $Z$ that is $1$ if and only if two independent coin flips are heads: Suppose that the first coin comes up heads with some very small probability $\delta$, whereas the second coin is fair. Since $Pr[ Z = 1] = \delta/2$, it follows that $H[ Z ]$ is also very small (depending on $\delta$).

On the other hand, if we define $A$ to be the event that the biased coin comes up with heads, then $H[ Z | A ] = 1$, and so the entropy of $Z$ increases if we condition on $A$.

However, I'm confused by the implications of this and seem to be getting contradictory results: Let $X_1$ be the random variable of the biased coin, and $X_2$ be the random variable of the fair coin, where $X_i=1$ means the $i$-th coin comes up heads. Then
$$Z = X_1 X_2.$$

According to my friend's claim above, it holds that:

$$
H[Z] = H[ X_1 X_2 ] < H[ X_1 X_2 | X_1=1 ] = H[ X_2 ] = 1.
$$

However, using the chain rule, we get
\begin{align*}
H[Z] = H[ X_1 X_2 ] = H[ X_2 | X_1 ] + H[ X_1 ]
&= H[ X_2 ] + H[ X_1 ] \quad \text{(since the coin flips are indep.)} \\
&= 1 + H[ X_1 ]
> 1,
\end{align*}

which contradicts my friend's claim.

What am I missing?

Best Answer

As explained, you cannot use chain's rule, as this apply to the joint entropy $H(X_1,X_2)$ not the entropy of the product $H(X_1X_2)$.

The fact that the entropy increases once you know the outcome of the biased coin is not surprising : Shannon entropy is maximised for uniform distribution. $Z$ is basically a Bernouilli with probability $\delta/2$ while $Z\mid A$ is a Bernouilli $1/2$, the discrete uniform probability. Therefore $H(Z\mid A)\geq H(Z)$.

Intuitively the entropy measure the average information that one outcome will give you. Initially, as $Z$ will be $0$ with very high probability, the its entropy must be very small : you'll likely learn nothing from one outcome. While once you know the result of the biased coin, you'll get much more information from the outcome of the fair coin.