Solved – Is the maximum entropy distribution consistent with given marginal distributions the product distribution of the marginals

distributionsjoint distributionmarginal-distributionmaximum-entropy

There are generally many joint distributions $P(X_1 = x_1, X_2 = x_2, …, X_n = x_n)$ consistent with a known set marginal distributions $f_i(x_i) = P(X_i = x_i)$.

Of these joint distributions, is the product formed by taking the product of the marginals $\prod_i f_i(x_i)$ the one with the highest entropy?

I certainly believe this is true, but would really like to see a proof.

I'm most interested in the case where all variables are discrete, but would also be interested in commentary about entropy relative to product measures in the continuous case.

Best Answer

One way is to use the properties of the Kullback-Leibler divergence.

Let $\mathfrak{P}$ be the family of distributions with the given margins, and let $Q$ be the product distribution (and obviously $Q \in \mathfrak{P}$).

Now, for any $P \in \mathfrak{P}$, the cross entropy is:

$H(P,Q) = E_P [\log q(X)] = E_P \left[ \log \prod_i q_i(X) \right] = \sum_i E_P [\log q_i(X)] = \sum_i H(P_i,Q_i)$

that is, the sum of the cross entropy of the margins. Since the margins are all fixed, this term itself must be fixed.

Now we can write the KL divergence as:

$D_{KL}(P \| Q) = H(P,Q) - H(P)$

and hence:

$\operatorname*{arg\,min}_{P \in \mathfrak{P}} \ D_{KL}(P \| Q) = \operatorname*{arg\,max}_{P \in \mathfrak{P}} \ H(P) $

that is, the distribution $P$ which maximises the entropy is the one which minimises the KL divergence with $Q$, which by the properties of the KL divergence, we know is $Q$ itself.