[Math] Entropy and total variation distance

entropyinequalitiesit.information-theorypr.probabilityprobability distributions

Let $X$, $Y$ be discrete random variables taking values within the same set of $N$ elements. Let the total variation distance $|P-Q|$ (which is half the $L_1$ distance between the distributions of $P$ and $Q$) be at most $\epsilon$. What is a standard, easily provable bound on the difference between the entropies of $X$ and $Y$?

(It is easy to see that such a bound must depend not only on $\epsilon$ but also on $N$.)

Best Answer

Claim. If $\|P-Q\|\leq\varepsilon\leq\frac{1}{2}$, then $|H(P)-H(Q)| \leq H(\varepsilon) + \varepsilon\log N$.

Proof. Let $\varepsilon':=\|P-Q\|$. Let $(X,Y)$ be an optimal coupling of $P$ and $Q$, so that \begin{align} \mathbb{P}(X\neq Y) = \|P-Q\| \;. \end{align} Using a standard construction, we can assume that $X$ and $Y$ have the particular form \begin{align} X &:= \begin{cases} Z & \text{if $B=0$,} \\ \tilde{X} & \text{if $B=1$,} \end{cases} & Y &:= \begin{cases} Z & \text{if $B=0$,} \\ \tilde{Y} & \text{if $B=1$,} \end{cases} \end{align} where $B$, $Z$ and $(\tilde{X},\tilde{Y})$ are independent and $B\sim\text{Bern}(\varepsilon')$.

Note that \begin{align} H(X|B) \leq H(X) \leq H(B) + H(X|B) \;. \end{align} For $H(X|B)$ we can write \begin{align} H(X|B) &= \varepsilon' H(X|B=1) + (1-\varepsilon') H(X|B=0) \\ &= \varepsilon' H(\tilde{X}) + (1-\varepsilon') H(Z) \;. \end{align} Thus, \begin{align} \varepsilon' H(\tilde{X}) + (1-\varepsilon') H(Z) &\leq H(X) \leq H(B) + \varepsilon' H(\tilde{X}) + (1-\varepsilon') H(Z) \;, \tag{$\clubsuit$} \end{align} and similarly, \begin{align} \varepsilon' H(\tilde{Y}) + (1-\varepsilon') H(Z) &\leq H(Y) \leq H(B) + \varepsilon' H(\tilde{Y}) + (1-\varepsilon') H(Z) \;. \tag{$\spadesuit$} \end{align}

Combining ($\clubsuit$) and ($\spadesuit$) we get \begin{align} |H(X)-H(Y)| &\leq H(B) + \varepsilon' |H(\tilde{X}) - H(\tilde{Y})| \\ &\leq H(\varepsilon') + \varepsilon' \log N \\ &\leq H(\varepsilon) + \varepsilon \log N \;, \end{align} as claimed. QED


Edit [2018--09--17] (following Iosif Pinelis's comment).
Refining the above reasoning a little bit, we can get the better bound \begin{align} |H(P)-H(Q)|\leq H(\varepsilon) + \varepsilon\log(N-1) \;. \end{align}

Indeed, let $\Sigma$ denote the $N$-element set that is the support of $P$ and $Q$. As before, let $\varepsilon':=\|P-Q\|$, and let us discard the trivial cases $\varepsilon'\in\{0,1\}$, so that $0<\varepsilon'<1$.

Recalling from the construction of an optimal coupling, define for $a\in\Sigma$, \begin{align} R_0(a) &:= P(a)\land Q(a) & P_0(a) &:= P(a)-R_0(a) \\ & & Q_0(a) &:= Q(a)-R_0(a) \;. \end{align} Observe that $R_0$, $P_0$ and $Q_0$ are non-negative functions and \begin{align} \sum_{a\in\Sigma}R_0(a)=1-\varepsilon' \qquad\text{and}\qquad \sum_{a\in\Sigma}P_0(a)=\sum_{a\in\Sigma}Q_0(a)=\varepsilon' \;. \end{align} Thus, $\tilde{R}:=R_0/(1-\varepsilon')$, $\tilde{P}:=P_0/\varepsilon'$ and $\tilde{Q}:=Q_0/\varepsilon'$ are probability distributions on $\Sigma$ satisfying \begin{align} P(a) &= (1-\varepsilon')\tilde{R}(a) + \varepsilon'\tilde{P} \\ Q(a) &= (1-\varepsilon')\tilde{R}(a) + \varepsilon'\tilde{Q} \;. \end{align} If we now choose $Z\sim\tilde{R}$, $\tilde{X}\sim\tilde{P}$, $\tilde{Y}\sim\tilde{Q}$ and $B\sim\text{Bern}(\varepsilon')$ independently, we have a coupling as promised above.

Back to the inequality, observe that since both $P$ and $Q$ are non-negative and normalized, there necessarily exist $a,b\in\Sigma$ such that $P_0(a)=0$ and $Q_0(b)=0$. This means that each of $\tilde{P}$ and $\tilde{Q}$ is in fact supported on a strict subset of $\Sigma$. Hence, \begin{align} |H(\tilde{X})-H(\tilde{Y})| &\leq \max\{H(\tilde{P}),H(\tilde{Q})\} \leq \log(N-1) \end{align} and the (updated) claim follows as before.

Note.
As the example H A Helfgott gave in the comments ($\require{enclose}\enclose{horizontalstrike}{N=0}$, $\require{enclose}\enclose{horizontalstrike}{X\sim\text{Bern}(\varepsilon)}$ and $\require{enclose}\enclose{horizontalstrike}{Y\sim\text{Bern}(0)}$) shows, this refined bound is sharp at least when $\require{enclose}\enclose{horizontalstrike}{N=2}$.
Iosif Pinelis gave the following example in his comment below which shows that the refined bound is sharp for every $N$ and $\varepsilon$:

Let $\Sigma:=\{1,2,\ldots,N\}$ and \begin{align} P(a) &:= \begin{cases} 1 & \text{if $a=1$,}\\ 0 & \text{otherwise,} \end{cases} & Q(a) &:= \begin{cases} 1-\varepsilon & \text{if $a=1$,}\\ \varepsilon/(N-1) & \text{otherwise.} \end{cases} \end{align} Then, $\|P-Q\|=\varepsilon$ and $|H(P)-H(Q)|=H(Q)=H(\varepsilon)+\varepsilon\log(N-1)$.

Related Question