Solved – Prove that the maximum entropy distribution with a fixed covariance matrix is a Gaussian

entropyinformation theorymaximum-entropy

I'm trying to get my head around the following proof that the Gaussian has maximum entropy.

How does the starred step make sense? A specific covariance only fixes the second moment. What happens to the third, fourth, fifth moments etc?

enter image description here

Best Answer

The starred step is valid because (a) $p$ and $q$ have the same zeroth and second moments and (b) $\log(p)$ is a polynomial function of the components of $\mathbf{x}$ whose terms have total degrees $0$ or $2$.


You need to know only two things about a multivariate normal distribution with zero mean:

  1. $\log(p)$ is a quadratic function of $\mathbf{x}=(x_1,x_2,\ldots,x_n)$ with no linear terms. Specifically, there are constants $C$ and $p_{ij}$ for which $$\log(p(\mathbf{x}))=C + \sum_{i,j=1}^n p_{ij}\, x_i x_j.$$

    (Of course $C$ and the $p_{ij}$ can be written in terms of $\Sigma$, but this detail does not matter.)

  2. $\Sigma$ gives the second moments of the distribution. That is, $$\Sigma_{ij}=E_p(x_i x_j) = \int p(\mathbf{x})\, x_ix_j\, d\mathbf{x}.$$

We may use this information to work out an integral:

$$\eqalign{ & &\int(q(\mathbf{x}) - p(\mathbf{x}))\log(p(\mathbf{x}))d\mathbf{x} \\ &= &\int(q(\mathbf{x}) - p(\mathbf{x}))\left(C + \sum_{i,j=1}^n p_{ij}\, x_i x_j\right)d\mathbf{x}. }$$

It breaks into the sum of two parts:

  • $\int(q(x) - p(x))C\, d\mathbf{x} = C\left(\int q(\mathbf{x}) d\mathbf{x} - \int p(\mathbf{x}) d\mathbf{x}\right) = C(1 - 1) = 0$, because both $q$ and $p$ are probability density functions.

  • $\int(q(\mathbf{x}) - p(\mathbf{x})) \sum_{i,j=1}^n p_{ij}\, x_i x_jd\mathbf{x} = \sum_{i,j=1}^n p_{ij}\int(q(\mathbf{x}) - p(\mathbf{x}))x_i x_jd\mathbf{x} = 0$ because each of the integrals on the right hand side, $\int q(\mathbf{x}) x_i x_jd\mathbf{x}$ and $\int p(\mathbf{x}) x_i x_jd\mathbf{x}$, has the same value (to wit, $\Sigma_{ij}$). This is what the remark "yield the same moments of the quadratic form" is intended to say.

The result follows immediately: since $\int(q(\mathbf{x}) - p(\mathbf{x}))\log(p(\mathbf{x}))d\mathbf{x}=0$, we conclude that $\int q(\mathbf{x})\log(p(\mathbf{x}))d\mathbf{x} = \int p(\mathbf{x})\log(p(\mathbf{x}))d\mathbf{x}.$

Related Question