Solved – Entropy of (Sum of Gaussians) versus Sum of (Entropy of Gaussians)

entropyinformation theorymaximum-entropymutual information

Short version: How can the joint entropy of two independent variables be less than the sum of those independent variables? The joint entropy should encode all information that a scalar function can, right?

Long version: Assume there are 2 independent normal random variables $X, Y$ both with mean $0$ and variance $\sigma^2$.

  1. We know that entropy of $X$ and $Y$ is $H(X)= H(Y)= \ln(2\pi e \sigma^2)/2$ (derivation)
  2. The variance of the random variable $SUM = X + Y$ is $2 \sigma^2$
  3. 1 and 2 mean that $H(SUM)= \ln(2 \pi e (2 \sigma^2))/2$
  4. The sum of the entropies of 2 independent random variables is the entropy of their joint distribution, i.e. $H(X, Y) = H(X) + H(Y)$ . This implies that in this particular case
    $$H(X, Y) = (\ln(2\pi e \sigma^2)/2) \cdot 2.$$
  5. Now note that if $\sigma^2=(\pi e)^{-1}$, then from 3 and 4
    $$H(X,Y)= (\ln(2)/2) \cdot 2 = H(SUM)= ln(2 \cdot 2)/2 = \ln(2).$$

And if you increase $\sigma$ then $H(SUM) > H(X,Y)$. It seems quite fantastic that $(\pi e)^{-1}$ is an entropy tipping point for Gaussians. Do you know of any papers or books that make this observation? Is $N(0, (\pi e)^{-1})$ discussed as an alternative to $N(0,1)$ because of its neutrality in this context? And why is this happening at all? Shouldn't joint entropy be greater than entropy of any scalar function, it seems to be more general than any scalar function?

Best Answer

The issue is that you are working with a differential entropy for continuous random variables, which doesn't share all the nice properties of Shannon's entropy for discrete random variables and can behave counter to intuition. In particular, differential entropy can be negative!

The following might help to get a feel for what's going on. First, a little derivation. We have that

\begin{align} H[X + Y, Y] &= H[X + Y \mid Y] + H[Y] = H[X \mid Y] + H[Y] = H[X, Y], \\ H[X + Y, Y] &= H[Y \mid X + Y] + H[X + Y], \end{align}

so that,

$$H[X + Y] = H[X, Y] - H[Y \mid X + Y].$$

Since Shannon's entropy is always non-negative, the entropy of $X + Y$ will therefore always be smaller or equal to the entropy of $X, Y$, in line with your intuition. What must happen in your example is that $H[Y \mid X + Y]$ is negative, which is only possible because it is a differential entropy.

If you want a more well-behaved measure for continuous random variables, use relative entropy.

Related Question