Solved – Kullback-Leibler divergence and probability distribution function in MATLAB

kullback-leiblerMATLAB

I want to compute the Kullback-Leibler divergence (KL) of two Gaussians, the first with mean of 1 and the second -1, where both have the same variance say, 1.

In MATLAB, the distributions are:

y1 = normpdf([-10:0.1:10], -1, 1)
y2 = normpdf([-10:0.1:10],  1, 1)

The code I used to compute the KL is:

KL = sum(Apdf .* (log2(Apdf)-log2(Bpdf)))

Are these the inputs I should use for the KL? The result I got is 28, shouldn't it be 2?

Best Answer

There are two reasons why you did not get the answer 2.

1) The KL divergence being 2 is based on use of the natural log, which in MATLAB is log.

2) If you used log instead of log2 in your code, you would get the result 20. The reason is that in performing the integration, you neglected to multiply by the discretization increment between points, which in your calculation was 0.1.

Here is a correct solution:

>> Apdf = normpdf([-10:0.1:10], -1, 1);
>> Bpdf = normpdf([-10:0.1:10],  1, 1);
>> KL = 0.1 * sum(Apdf .* (log(Apdf)-log(Bpdf)))
KL =
   2.000000000000000

Related Solutions

Solved – Kullback-Leibler divergence of two normal distributions

The KL divergence is an asymmetric measure. It can be symmetrized for two distributions $F$ and $G$ by averaging or summing it, as in

$$KLD(F,G) = KL(F,G) + KL(G,F).$$

Because the formula quoted in the question clearly is symmetric, we might hypothesize that it is such a symmetrized version. Let's check:

$$ \left(\log \frac{\sigma_2}{\sigma_1} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2} - \frac{1}{2}\right) + \left(\log \frac{\sigma_1}{\sigma_2} + \frac{\sigma_2^2 + (\mu_2 - \mu_1)^2}{2 \sigma_1^2} - \frac{1}{2}\right)$$

$$=\left(\log(\sigma_2/\sigma_1) + \log(\sigma_1/\sigma_2)\right) + \frac{\sigma_1^2}{2\sigma_2^2} + \frac{\sigma_2^2}{2\sigma_1^2} + \left(\mu_1-\mu_2\right)^2\left(\frac{1}{2\sigma_2^2}+\frac{1}{2\sigma_1^2}\right) - 1$$

The logarithms obviously cancel, which is encouraging. The appearance of the factor $\left(\frac{1}{2\sigma_2^2}+\frac{1}{2\sigma_1^2}\right)$ multiplying the term with the means motivates us to introduce a similar sum of fractions in the first part of the expression, too, so we do so perforce, compensating for the two new terms (which are both equal to $1/2$) by subtracting them again and then collecting all multiples of this factor:

$$= 0 + \frac{\sigma_1^2}{2\sigma_2^2}+\left(\frac{\sigma_1^2}{2\sigma_1^2}-\frac{1}{2}\right) + \frac{\sigma_2^2}{2\sigma_1^2} + \left(\frac{\sigma_2^2}{2\sigma_2^2} - \frac{1}{2} \right)+ \cdots$$

$$= \left(\sigma_1^2 + \sigma_2^2\right)\left(\frac{1}{2\sigma_1^2} + \frac{1}{2\sigma_2^2}\right) - 1 + \left(\mu_1-\mu_2\right)^2\left(\frac{1}{2\sigma_2^2}+\frac{1}{2\sigma_1^2}\right) - 1$$

$$ = \frac{1}{2}\left(\left(\mu_1-\mu_2\right)^2 + \left(\sigma_1^2 + \sigma_2^2\right)\right)\left(\frac{1}{\sigma_2^2}+\frac{1}{\sigma_1^2}\right) - 2.$$

That's precisely the value found in the reference: it is the sum of the two KL divergences, also known as the symmetrized divergence.

Kullback-Leibler Divergence – Comprehensive Analysis

The Kullback-Leibler Divergence is not a metric proper, since it is not symmetric and also, it does not satisfy the triangle inequality. So the "roles" played by the two distributions are different, and it is important to distribute these roles according to the real-world phenomenon under study.

When we write (the OP has calculated the expression using base-2 logarithms)

$$\mathbb K\left(P||Q\right) = \sum_{i}\log_2 (p_i/q_i)p_i $$

we consider the $P$ distribution to be the "target distribution" (usually considered to be the true distribution), which we approximate by using the $Q$ distribution.

Now,

$$\sum_{i}\log_2 (p_i/q_i)p_i = \sum_{i}\log_2 (p_i)p_i-\sum_{i}\log_2 (q_i)p_i = -H(P) - E_P(\ln(Q))$$

where $H(P)$ is the Shannon entropy of distribution $P$ and $-E_P(\ln(Q))$ is called the "cross-entropy of $P$ and $Q$" -also non-symmetric.

Writing

$$\mathbb K\left(P||Q\right) = H(P,Q) - H(P)$$

(here too, the order in which we write the distributions in the expression of the cross-entropy matters, since it too is not symmetric), permits us to see that KL-Divergence reflects an increase in entropy over the unavoidable entropy of distribution $P$.

So, no, KL-divergence is better not to be interpreted as a "distance measure" between distributions, but rather as a measure of entropy increase due to the use of an approximation to the true distribution rather than the true distribution itself.

So we are in Information Theory land. To hear it from the masters (Cover & Thomas) "

...if we knew the true distribution $P$ of the random variable, we could construct a code with average description length $H(P)$. If, instead, we used the code for a distribution $Q$, we would need $H(P) + \mathbb K (P||Q)$ bits on the average to describe the random variable.

The same wise people say

...it is not a true distance between distributions since it is not symmetric and does not satisfy the triangle inequality. Nonetheless, it is often useful to think of relative entropy as a “distance” between distributions.

But this latter approach is useful mainly when one attempts to minimize KL-divergence in order to optimize some estimation procedure. For the interpretation of its numerical value per se, it is not useful, and one should prefer the "entropy increase" approach.

For the specific distributions of the question (always using base-2 logarithms)

$$ \mathbb K\left(P||Q\right) = 0.49282,\;\;\;\; H(P) = 1.9486$$

In other words, you need 25% more bits to describe the situation if you are going to use $Q$ while the true distribution is $P$. This means longer code lines, more time to write them, more memory, more time to read them, higher probability of mistakes etc... it is no accident that Cover & Thomas say that KL-Divergence (or "relative entropy") "measures the inefficiency caused by the approximation."

Best Answer

Related Solutions

Solved – Kullback-Leibler divergence of two normal distributions

Kullback-Leibler Divergence – Comprehensive Analysis

Related Question