Solved – Kullback-Leibler divergence of two normal distributions

distance-functionskullback-leiblermathematical-statisticsnormal distributionsimilarities

I was recently trying to find a way to compute the KL-divergence between 2 populations that are normally distributed using the mean and variance of each population. But I found several different formulas and don't know which one to use exactly.

The first equation below from a research paper called "Endo A, Nagatani F, et al. (2006). Minimization method for balancing continuous prognostic variables between treatment and control groups using Kullback-Leibler divergence. Contemp Clin Trials 27 (5): 420-31."

KLD formula from research paper

The second equation is from
another question: log(σ₂/σ₁) + ((σ₁² + (μ₁-μ₂)²)/(2*σ₂²)) – ½

Could someone help me clarify which equation is the correct one to use?

Best Answer

The KL divergence is an asymmetric measure. It can be symmetrized for two distributions $F$ and $G$ by averaging or summing it, as in

$$KLD(F,G) = KL(F,G) + KL(G,F).$$

Because the formula quoted in the question clearly is symmetric, we might hypothesize that it is such a symmetrized version. Let's check:

$$ \left(\log \frac{\sigma_2}{\sigma_1} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2} - \frac{1}{2}\right) + \left(\log \frac{\sigma_1}{\sigma_2} + \frac{\sigma_2^2 + (\mu_2 - \mu_1)^2}{2 \sigma_1^2} - \frac{1}{2}\right)$$

$$=\left(\log(\sigma_2/\sigma_1) + \log(\sigma_1/\sigma_2)\right) + \frac{\sigma_1^2}{2\sigma_2^2} + \frac{\sigma_2^2}{2\sigma_1^2} + \left(\mu_1-\mu_2\right)^2\left(\frac{1}{2\sigma_2^2}+\frac{1}{2\sigma_1^2}\right) - 1$$

The logarithms obviously cancel, which is encouraging. The appearance of the factor $\left(\frac{1}{2\sigma_2^2}+\frac{1}{2\sigma_1^2}\right)$ multiplying the term with the means motivates us to introduce a similar sum of fractions in the first part of the expression, too, so we do so perforce, compensating for the two new terms (which are both equal to $1/2$) by subtracting them again and then collecting all multiples of this factor:

$$= 0 + \frac{\sigma_1^2}{2\sigma_2^2}+\left(\frac{\sigma_1^2}{2\sigma_1^2}-\frac{1}{2}\right) + \frac{\sigma_2^2}{2\sigma_1^2} + \left(\frac{\sigma_2^2}{2\sigma_2^2} - \frac{1}{2} \right)+ \cdots$$

$$= \left(\sigma_1^2 + \sigma_2^2\right)\left(\frac{1}{2\sigma_1^2} + \frac{1}{2\sigma_2^2}\right) - 1 + \left(\mu_1-\mu_2\right)^2\left(\frac{1}{2\sigma_2^2}+\frac{1}{2\sigma_1^2}\right) - 1$$

$$ = \frac{1}{2}\left(\left(\mu_1-\mu_2\right)^2 + \left(\sigma_1^2 + \sigma_2^2\right)\right)\left(\frac{1}{\sigma_2^2}+\frac{1}{\sigma_1^2}\right) - 2.$$

That's precisely the value found in the reference: it is the sum of the two KL divergences, also known as the symmetrized divergence.