[Math] Kullback-Leibler divergence and mixture distributions

information theoryprobabilitystatistics

Let's say I have three probability densities, $h, g$, and $f$, where f is a weighted mixture of h and g, i.e.,

$$ f(x) = w\,h(x) + (1-w)\,g(x) $$

For simplicity, let's assume all densities share the same support.

Now, it seems reasonable to expect that if, say, $h$ is the "true" density, then the Kullback-Leibler divergence (KL) between $f$ and $h$ should be smaller than KL between $g$ and $h$, i.e.,

$$ KL(f,h) \leq KL(g,h) $$

as $|f(x) – h(x)| \leq |g(x) – h(x)|$ everywhere.

However, I have no idea how to prove this, and would be very happy about any suggestions (or suggestions why my intuition might be wrong).

Thanks in advance!

Best Answer

It just came to my mind that this follows immediately from the convexity of KL divergence. It's hard to find info about it on the web, but page 6 here http://camo.ici.ro/journal/vol11/v11d15.pdf describes everything you need in terms of discrete relative entropy $D_\text{KL}$ (the very same concept).

In your case, \begin{align*} \text{KL}(h,f) &= \text{KL}(w \cdot h + (1-w) \cdot h, \ w \cdot h + (1-w) \cdot g) \\ &\leq w \cdot \text{KL}(h,h) + (1-w) \cdot \text{KL}(h,g) = (1-w) \cdot \text{KL}(h,g) \end{align*} and summarized $$\text{KL}(h,f) \leq(1-w) \cdot \text{KL}(h,g)$$

which is an even stronger inequality than what you asked for.


My previous text: Not a full answer but my thoughts.

So you want to show $$\text{KL}(h,f) \leq \text{KL}(h,g)$$ or rather $$\int_\mathbb{R} h(x) \log \frac{h(x)}{w \ h(x) + (1-w)g(x)} dx \leq \int_\mathbb{R} h(x) \log \frac{h(x)}{g(x)} dx$$ \begin{align*} 0 &\leq \int_\mathbb{R} h(x) \log \frac{w \ h(x) + (1-w)g(x)}{g(x)} dx \\ 0 &\leq \int_\mathbb{R} h(x) \log\left(1-w + w\frac{h(x)}{g(x)}\right) dx \end{align*}

Now this cannot be easy to show directly because the proof for positivity of KL-divergence (a.k.a. relative entropy) between distributions is already quite long and tricky. You could carefully read proofs of said theorem and try to adapt them to the problem at hand.

Alternatively, you could rewrite the above to \begin{align*} 0 &\leq \int_\mathbb{R} h(x) \log \frac{h(x)}{a_w(x)} dx = \text{KL}(h,a_w) \end{align*} with $$a_w(x) := \frac{1}{\frac{w}{g(x)} + \frac{1-w}{h(x)}} \ .$$ If you can show that $a_w(x)$ is a probability distribution $\forall w \in [0,1]$ (maybe it isn't, I don't know), then $0 \leq \text{KL}(h,a_w)$ and you're done.