[Math] Maximum entropy joint distribution from marginals

entropyinformation theoryrandom variables

How does one find the maximum entropy joint distribution of two random variables X and Y given their marginal probability mass functions?

I know:

  • I have the marginals, meaning p(x) and p(y) are fixed.
  • The entropy is maximized when the distribution is even (p(x,y) = 1/n for all x,y), but it can't be even due to the marginals.
  • The joint distribution is only the product of the marginals when X and Y are independent.
  • The KL Divergence looks handy, but I can't use it to prove independence (zero mutual info) if I only know the marginals.

Does anyone know what I'm missing?

Best Answer

We know that $H(Y|X) \le H(Y)$ with equality iff $X$ and $Y$ are independent. (This is a consequence of $I(X;Y) \ge 0$ which is a consequence of $D(p(X,Y) || p(X) p(Y)) \ge 0$ which is a consequence of Jensen's inequality; see eg. Cover and Thomas, theorem 2.6.5)

This implies $H(X,Y) \le H(X) + H(Y)$ with equality iff $X$ and $Y$ are independent (another basic result).

In our problem, $H(X)$ and $H(Y)$ are fixed, so the joint entropy is bounded by the above, and that bound is attained (only) if $X$ and $Y$ are independent , i.e. $P(X,Y) = P(X) P(Y)$. Hence, this is the joint probability that maximizes the joint entropy.