A data processing inequality for a non-$f$ divergence

Consider two probability distributions $P$ and $Q$ on some space $\mathcal X$. Given an convex function with $f(1) = 0$, the $f$-divergence from $P$ to $Q$ is defined by
$$
D_f(P \| Q) = \int f\left(\frac{dP}{dQ}\right) \ dQ.
$$
A well-known property of $f$-divergences is the data processing inequality $D_f(P \| Q) \ge D_f(T^{-1} P \| T^{-1} Q)$ where $T^{-1}$ is a measurable mapping from $\mathcal X$ to another space $\mathcal Y$. The intuition being that you can't make two distributions easier to distinguish by applying an a-priori known transformation to both.

Given this background, I am interested in the following divergence
$$
V(P \| Q) = \int \left(\log \frac{dP}{dQ}\right)^2 \ dQ.
$$
This is not an $f$-divergence, and it is possible to show that the data processing inequality fails (I randomly searched over $\mathcal X = \{1,2,3\}$ and $\mathcal Y = \{1,2\}$ and found a counterexample). My question is whether a weaker version of the data processing inequality holds. For example, could we find a $K$ such that $V(T^{-1} P \| T^{-1} Q) \le K \, V(P \| Q)$? Or a function $\phi(x)$ such that
$$
V(T^{-1} P \| T^{-1} Q) \le \phi\{V(P \| Q)\} \, V(P \| Q)
$$
where $\phi(x)$ is bounded near $0$? Or is there an obstruction to this type of result?

Best Answer

By evaluating second derivatives, one can verify that $f(x) = (\ln x)^2$ is convex for $x \le e$. Then, upper bound $f(x)$ with the following convex function: $$g(x) =\begin{cases} f(x) & \text{if }x\le e \\ 2x/e -1 & \text{otherwise} \end{cases} $$ Then, $D_g(P\Vert Q) = \int g\left(\frac{dP}{dQ}\right) dQ$ is an $f$-divergence, and $V(P\Vert Q) \le D_g(P\Vert Q)$ for any $P$ and $Q$.

Define $\alpha =\sup \frac{dP}{dQ}$, where the maximization is over measurable sets. Note that $\frac{g(x)}{f(x)}$ monotonically increases in $x$ (this can be checked by evaluating the derivative in Mathematica). Then, $$D_g(P\Vert Q) = \int f\Big(\frac{dP}{dQ}\Big)\frac{g\Big(\frac{dP}{dQ}\Big)}{f\Big(\frac{dP}{dQ}\Big)} dQ \le \frac{g(\alpha)}{f(\alpha)}V(P\Vert Q) $$ This gives the sequence of inequalities $$V(T^{-1}P \Vert T^{-1}Q) \le D_g(T^{-1} P\Vert T^{-1} Q) \le D_g(P\Vert Q) \le \frac{g(\alpha)}{f(\alpha)} V(P\Vert Q).$$

For finite outcome spaces, $\alpha \to 1$ and $\frac{g(\alpha)}{f(\alpha)}\to 1$ as $V(P\Vert Q) \to 0$.

Best Answer

Related Solutions

Statistics – Relationship of L1 (Total Variation) Distance to Hypothesis Testing

[Math] Information Theory – Data Processing Inequality

Related Question