[Math] Lower bounds on Kullback-Leibler divergence

inequalitiesit.information-theorypr.probabilityprobability distributionsst.statistics

This was originally a question on Cross Validated.

Are there any (nontrivial) lower bounds on the Kullback-Leibler divergence $KL(f\Vert g)$ between two measures / densities?

Informally, I am trying to study problems where $f$ is some target density, and I want to show that if $g$ is chosen "poorly", then $KL(f\Vert g)$ must be large. Examples of "poor" behaviour could include different means, moments, etc.

Example: If $f=\sum_ka_kf_k$ and $g=\sum_kb_kg_k$ are two mixture distributions, is there a lower bound on $KL(f\Vert g)$ in terms of $KL(f_k\Vert g_j)$ (and also the convex weights $a_k,b_j$)? Intuitively, we'd like to say that if $\inf_{k\ne j} KL(f_k\Vert g_j)$ is "big", then $KL(f\Vert g)$ cannot be small.

Anything along these lines (for mixtures or arbitrary measures) would be useful. Obviously, you can make assumptions about the quantities involved. Alternatively, references to any papers that study these kinds of problems (either directly or indirectly) would be helpful!

Best Answer

Pinsker's inequality states that \begin{equation} \text{KL}(f|g)\ge B_P:=\|f-g\|^2/2, \end{equation} where $\|f-g\|:=\int|f-g|$ is the total variation norm of the difference between the distributions with densities $f$ and $g$.

Another lower bound on $\text{KL}(f|g)$ can be given in terms of the Hellinger distance $d_H(f,g):=\frac1{\sqrt2}\|\sqrt f-\sqrt g\|_2$: \begin{equation} \text{KL}(f|g)\ge B_H:=2d_H(f,g)^2=\int(\sqrt f-\sqrt g)^2; \end{equation} see e.g. mathSE.

One may note note that either one of these two lower bounds, $B_P$ and $B_H$, may be better (that is, greater) than the other. E.g., if the densities $f$ and $g$ with respect to the counting measure on the set $\{1,2\}$ are given by the vectors $(1/2,1/2)$ and $(1/2-t,1/2+t)$ for $t\in(0,1/2)$, then $B_P>B_H$ for $t\in(0,t_*)$, and $B_P<B_H$ for $t\in(t_*,1/2)$, where $t=0.495\dots$ is a certain algebraic number.