I'm trying to get my head around the following proof that the Gaussian has maximum entropy.
How does the starred step make sense? A specific covariance only fixes the second moment. What happens to the third, fourth, fifth moments etc?
entropyinformation theorymaximum-entropy
Best Answer
The starred step is valid because (a) $p$ and $q$ have the same zeroth and second moments and (b) $\log(p)$ is a polynomial function of the components of $\mathbf{x}$ whose terms have total degrees $0$ or $2$.
You need to know only two things about a multivariate normal distribution with zero mean:
$\log(p)$ is a quadratic function of $\mathbf{x}=(x_1,x_2,\ldots,x_n)$ with no linear terms. Specifically, there are constants $C$ and $p_{ij}$ for which $$\log(p(\mathbf{x}))=C + \sum_{i,j=1}^n p_{ij}\, x_i x_j.$$
(Of course $C$ and the $p_{ij}$ can be written in terms of $\Sigma$, but this detail does not matter.)
$\Sigma$ gives the second moments of the distribution. That is, $$\Sigma_{ij}=E_p(x_i x_j) = \int p(\mathbf{x})\, x_ix_j\, d\mathbf{x}.$$
We may use this information to work out an integral:
$$\eqalign{ & &\int(q(\mathbf{x}) - p(\mathbf{x}))\log(p(\mathbf{x}))d\mathbf{x} \\ &= &\int(q(\mathbf{x}) - p(\mathbf{x}))\left(C + \sum_{i,j=1}^n p_{ij}\, x_i x_j\right)d\mathbf{x}. }$$
It breaks into the sum of two parts:
$\int(q(x) - p(x))C\, d\mathbf{x} = C\left(\int q(\mathbf{x}) d\mathbf{x} - \int p(\mathbf{x}) d\mathbf{x}\right) = C(1 - 1) = 0$, because both $q$ and $p$ are probability density functions.
$\int(q(\mathbf{x}) - p(\mathbf{x})) \sum_{i,j=1}^n p_{ij}\, x_i x_jd\mathbf{x} = \sum_{i,j=1}^n p_{ij}\int(q(\mathbf{x}) - p(\mathbf{x}))x_i x_jd\mathbf{x} = 0$ because each of the integrals on the right hand side, $\int q(\mathbf{x}) x_i x_jd\mathbf{x}$ and $\int p(\mathbf{x}) x_i x_jd\mathbf{x}$, has the same value (to wit, $\Sigma_{ij}$). This is what the remark "yield the same moments of the quadratic form" is intended to say.
The result follows immediately: since $\int(q(\mathbf{x}) - p(\mathbf{x}))\log(p(\mathbf{x}))d\mathbf{x}=0$, we conclude that $\int q(\mathbf{x})\log(p(\mathbf{x}))d\mathbf{x} = \int p(\mathbf{x})\log(p(\mathbf{x}))d\mathbf{x}.$