Confusing about when conditioning on event can increase the entropy.

In a discussion, we were trying to figure out if it is possible that entropy increases. My friend came up with the following scenario:

Consider an indicator random variable $Z$ that is $1$ if and only if two independent coin flips are heads: Suppose that the first coin comes up heads with some very small probability $\delta$, whereas the second coin is fair. Since $Pr[ Z = 1] = \delta/2$, it follows that $H[ Z ]$ is also very small (depending on $\delta$).

On the other hand, if we define $A$ to be the event that the biased coin comes up with heads, then $H[ Z | A ] = 1$, and so the entropy of $Z$ increases if we condition on $A$.

However, I'm confused by the implications of this and seem to be getting contradictory results: Let $X_1$ be the random variable of the biased coin, and $X_2$ be the random variable of the fair coin, where $X_i=1$ means the $i$-th coin comes up heads. Then
$$Z = X_1 X_2.$$

According to my friend's claim above, it holds that:

$$
H[Z] = H[ X_1 X_2 ] < H[ X_1 X_2 | X_1=1 ] = H[ X_2 ] = 1.
$$
However, using the chain rule, we get
\begin{align*}
H[Z] = H[ X_1 X_2 ] = H[ X_2 | X_1 ] + H[ X_1 ]
&= H[ X_2 ] + H[ X_1 ] \quad \text{(since the coin flips are indep.)} \\
&= 1 + H[ X_1 ]
> 1,
\end{align*}
which contradicts my friend's claim.

What am I missing?

Best Answer

As explained, you cannot use chain's rule, as this apply to the joint entropy $H(X_1,X_2)$ not the entropy of the product $H(X_1X_2)$.

The fact that the entropy increases once you know the outcome of the biased coin is not surprising : Shannon entropy is maximised for uniform distribution. $Z$ is basically a Bernouilli with probability $\delta/2$ while $Z\mid A$ is a Bernouilli $1/2$, the discrete uniform probability. Therefore $H(Z\mid A)\geq H(Z)$.

Intuitively the entropy measure the average information that one outcome will give you. Initially, as $Z$ will be $0$ with very high probability, the its entropy must be very small : you'll likely learn nothing from one outcome. While once you know the result of the biased coin, you'll get much more information from the outcome of the fair coin.

Best Answer

Related Solutions

[Math] Question regarding the Entropy of a probability mass function

[Math] Chernoff bound probability: value of $n$ so that with probability $.999$ at least half of the coin flips come out heads

Related Question