Solved – Symmetrised Kullback – Leibler divergence

computational-statisticskullback-leiblermetricprobabilityr

I have trouble understanding KL divergence, where P is probability mass function of true distribution of data and Q is the approximation of P.

The definition of KL divergence is:

$$D_{\mathrm{kl}}(P || Q) = \sum_i \ln\left( \frac{P_i}{Q_i}\right)P_i$$

If I want a symmetrised KL divergence, should it look like the following ?

$$D_{\mathrm{kl}}(P || Q) + D_{\mathrm{kl}}(Q||P) = \sum_i \ln\left( \frac{P_i}{Q_i}\right)P_i + \sum_i \ln\left( \frac{Q_i}{P_i}\right)Q_i$$

Also, which package should I use in R to compute the KL divergence for discrete distributions ? Flexmix or FNN ? Or should I just write my own R function for this?

And what if I don't really know $P$, the true distribution of data?

Best Answer

There are two typical ways to symmetrize the KL divergence:

$$\frac{D(P \| Q) + D(Q \| P)}{2}$$

or

$$\frac{1}{2} D\left(P \; \Big\| \frac{P + Q}{2} \right) + \frac{1}{2} D\left(Q \; \Big\| \frac{P + Q}{2} \right)$$

where $\frac{P+Q}{2}$ means a mixture distribution equally between $P$ and $Q$.

The latter is known as the Jensen-Shannon divergence and has some nice properties. The former is also often nice though, and I use it to good effect in some of my own work.

I don't use R, but it's pretty simple to compute this yourself between two discrete distributions defined by vectors with consistent labeling.

In terms of estimation: the continuous case is kind of hard, but Wang, Kulkarni, and Verdú (2009) give a pretty good estimator based on nearest-neighbor distances. In the discrete case, I'd think that $D(\hat{P} \| \hat{Q})$, where $\hat{P}$, $\hat{Q}$ are any estimate of the density (such as the empirical PMFs) would do fine in most simple cases; I don't know of a direct estimator.