Why is the entropy of a posterior Gaussian distribution higher than its prior

entropyinformation theory

I am puzzled with a problem regarding the entropy of a multivariate Gaussian distribution after a Bayesian-like update.

Let $\mathcal{N}(\mu_0, \Sigma_0)$ be a prior $D$-dimensional multivariate Gaussian distribution. After conditioning this distribution on noisy observations, let the resulting posterior Gaussian distribution (of same dimensionality $D$) be $\mathcal{N}(\mu_1, \Sigma_1)$.

The entropy of each probability distribution is calculated with the following formula:

$$H=\frac{1}{2}ln|\Sigma|+\frac{D}{2}(1+\ln(2\pi))$$

My question: Given that the posterior distribution is more confident (carries less uncertainty) than the prior distribution, why would its entropy increase?

I put together a toy example with dimensionality $D=18$. Here are two covariance matrices (of shape 18×18). The prior covariance ($\Sigma_0$) is shown on the left and it is conditioned on some observations, resulting in the posterior covariance ($\Sigma_1$) on the right:

enter image description here

The magnitude of the values in the posterior covariance matrix are clearly lower than the values in the prior covariance matrix, and yet its entropy is higher (128 > 103). How is that possible? I would have expected the entropy to decrease since the posterior distribution is a lot less uncertain than the prior distribution.

My understanding of entropy here is probably flawed, any intuitive explanation/clarification would be greatly appreciated. Thanks!

Best Answer

Your example is not very clear, and I'm not sure if this is the case, but:

In general, (and, more simply, with discrete variables, and true Shannon entropies [*]), "conditioning reduces the entropy", that's true... but only in average.

That is: $H(X | Y) \le H(X)$ is true. But this does not imply $H(X | Y =y) \le H(X)$ for all $y$. The entropy conditioned on some particular value can increase.

For example, let $X, Y$ have this joint probability:

$$ \begin{array}{c|cc} X \backslash Y & 0 & 1 \\ \hline 0 & \frac{1}{3} & 0 \\ 1 & \frac{1}{3} & \frac{1}{3} \\ \end{array} $$

Then $H(Y | X=0)= 0 $ but $H(Y | X=1)= 1 $ , hence $H(Y | X=1) > H(Y)$


[*] Remember that differential entropy is not really the Shannon entropy, and not all the properties of the latter apply to the former. But this does not seem to be your problem.

Related Question