Mutual Information vs Kullback–Leibler Divergence – Are They Equivalent?

kullback-leiblermutual information

From my readings, I understand that:

  1. Mutual information $\mathit{(MI)}$ is a metric as it meets the triangle inequality, non-negativity, indiscernability and symmetry criteria.
  2. The Kullback–Leibler divergence $\mathit{(D_{KL})}$ is not a metric as it does not obey the triangle inequality

However, one answer on Cross Validated (Information gain, mutual information and related measures) [the second answer], it was shown that mutual information and Kullback–Leibler divergence are equivalent. How can this be given that $\mathit{MI}$ is a metric and $\mathit{D_{KL}}$ is not? I can only assume that I am missing something here.

Best Answer

Mutual information is not a metric. A metric $d$ satisfies the identity of indisceribles: $d(x, y) = 0$ if and only if $x = y$. This is not true of mutual information, which behaves in the opposite manner--zero mutual information implies that two random variables are independent (as far from identical as you can get). And, if two random variables are identical, they have maximal mutual information (as far from zero as you can get).

You're correct that KL divergence is not a metric. It's not symmetric and doesn't satisfy the triangle inequality.

Mutual information and KL divergence are not equivalent. However, the mutual information $I(X, Y)$ between random variables $X$ and $Y$ is given by the KL divergence between the joint distribution $p_{XY}$ and the product of the marginal distributions $p_X \otimes p_Y$ (what the joint distribution would be if $X$ and $Y$ were independent).

$$I(X, Y) = D_{KL}(p_{XY} \parallel p_X \otimes p_Y)$$

Although mutual information is not itself a metric, there are metrics based on it. For example, the variation of information:

$$VI(X, Y) = H(X, Y) - I(X, Y) = H(X) + H(Y) - 2 I(X, Y)$$

where $H(X)$ and $H(Y)$ are the marginal entropies and $H(X, Y)$ is the joint entropy.

Related Question