Probability Theory – Relationship of $L^1$ Norm to Total Variation Distance and Variance Bound

inequalityinformation theoryprobability theory

I am trying to find a bound for variance of an arbitrary distribution $f_Y$ given a bound of a Kullback-Leiber divergence from a zero-mean Gaussian to $f_Y$, as I've explained in this related question. From page 10 of this article, it seems to me that:

$$\frac{1}{2}\left(\int_{-\infty}^{\infty}|p_Z(x)-p_Y(x)|dx\right)^2 \leq D(p_Z\|p_Y)$$

I have two questions:

1) How does this come about? The LHS is somehow related to total variation distance, which is $\sup\left\{|\int_A f_X(x)dx-\int_A f_Y(x)dx|:A \subset \mathbb{R}\right\}$ according to wikipedia article, but I don't see a connection. Can someone elucidate?

2) Section 6 on page 10 of the same article seems to talk about variation bounds, but I can't understand it… Can someone "translate" that to the language that someone with a graduate-level course on probability can understand? (I haven't taken measure theory, unfortunately.)

Best Answer

1) Check out Lemma 11.6.1 in Elements of Information Theory by Thomas and Cover.

2) The LHS is essentially the total variation between probability measures $p_Z$ and $p_Y$ (see here). I think "variation bounds" quite literally means bounds on the total variation between the probability measures, as given in the Lemma on p. 11.

Related Question