Solved – Is the square root of the symmetric Kullback-Leibler divergence a metric

kullback-leiblermetric

It is well known that the square root of the Jensen-Shannon divergence is a true metric, but how about the square root of symmetric KL: D(P||Q)+D(Q||P)? I have reasons to believe that it also is a true metric but cannot find any references on that other than anecdotal comments such as that it behaves more like a metric when used.

Update 1

Kullback-Leibler divergence: $D(P||Q) = \sum_i p_i\log(p_i/q_i)$

Jensen-Shannon divergence: $J(P,Q) = \big(D(P||(P+Q)/2)+D(Q||(P+Q)/2)\big)/2$

Symmetric KL divergence: $S(P,Q) = D(P||Q)+D(Q||P) = \sum_i (p_i-q_i)\log(p_i/q_i)$

Square root of symmetric KL: $d_{KL}(P,Q) = \sqrt{S(P,Q)}$

Is $d_{KL}$ a metric?

Update 2

I think the following upper and lower bounds hold:

$\sum_i (p_i-q_i)^2 \leq \sum_i (p_i-q_i)\log(p_i/q_i) \leq \sum_i \log(p_i/q_i)^2$

Both of the square root of the bounds are metrics, I suppose, since they are the square of the Euclidean distances in the probability space and the log-prob space respectively.

Best Answer

No, the square root of the symmetrised KL divergence is not a metric. A counterexample is as follows:

  • Let $P$ be a coin that produces a head 10% of the time.
  • Let $Q$ be a coin that produces a head 20% of the time.
  • Let $R$ be a coin that produces a head 30% of the time.
  • Then $d(P, Q) + d(Q, R) = 0.284... + 0.232... < 0.519... = d(P, R)$.

However, for $P$ and $Q$ very close together, $D(P, Q)$ and $J(P, Q)$ and $S(P, Q)$ are essentially the same (they are proportional to one another $+ O((P-Q)^3)$) and their square root is a metric (to the same order). We can take this local metric and integrate it up over the whole space of probability distributions to obtain a global metric. The result is:

$$A(P, Q) = \cos^{-1}\left(\sum_x \sqrt{P(x)Q(x)} \right)$$

I worked this out myself, so I'm afraid I do not know what it is called. I will use A for Alistair until I find out. ;-)

By construction, the triangle inequality in this metric is tight. You can actually find a unique shortest path through the space of probability distributions from $P$ to $Q$ that has the right length. In that respect it is preferable to the otherwise similar Hellinger distance:

$$H(P, Q) = 1 - \sqrt{\sum_x \sqrt{P(x)*Q(x)} }$$

Update 2013-12-05: Apparently this is called the Battacharrya arc-cos distance.

Related Question