Probability Theory – Total Variation Distance of Probability Measures

measure-theoryprobability theorysigned-measurestotal-variation

Let $(E,\mathcal E)$ be a measurable space, $\mu$ and $\nu$ be probability measures on $(E,\mathcal E)$ and $$|\nu-\mu|:=\sup_{B\in\mathcal E}|\nu(B)-\mu(B)|$$ denote the total variation distance of $\mu$ and $\nu$.

How can we show that $$|\nu-\mu|=\frac12\sup_f\left|\int f\:{\rm d}(\nu-\mu)\right|,$$ where the supremum is taken over all bounded $\mathcal E$-measurable $f:E\to\mathbb R$ with $|f|\le1$?

It's clear to me that we've got "$\le$". For the other direction, the trick seems to be to consider the Hahn decomposition $E=E^+\cup E^-$ of $\nu-\mu$, i.e. $E^\pm\in\mathcal E$ and $$\pm(\nu-\mu)(B\cap E^\pm)\ge0\;\;\;\text{for all }B\in\mathcal E.$$ Now let $f$ as above. We can write $$\int f\:{\rm d}(\nu-\mu)=\int_{E^+}f\:{\rm d}(\nu-\mu)+\int_{E^-}f\:{\rm d}(\nu-\mu),$$ but how do we see that the absolute value of this is at most $(\nu-\mu)(E^+)-(\nu-\mu)(E^-)$?

Best Answer

First, I think that you are missing a factor of $2$ somewhere. The correct result is $$|\nu - \mu| = \frac12 \sup_f \bigg| \int f d(\nu - \mu) \bigg |.$$

The proof I know of this goes through another characterisation of the total variation distance given by Scheffe's lemma.

Scheffe's Lemma: Fix a reference measure $m$ such that there are measurable $g,h: \Omega \to [0,\infty)$ such that $d\mu = g dm$ and $d \nu = h dm$. Then $$|\nu - \mu| = \frac12 \int |g-h| dm.$$

Note that requiring the existence of a reference measure is no real imposition. Radon-Nikodym means you can always just take $m = \frac12 ( \mu + \nu)$.

Given this characterisation, the remaining inequality is straightforward. Indeed, \begin{align} \bigg |\int f d(\nu - \mu) \bigg | =& \bigg|\int f (g-h) dm \bigg | \\ \leq& \int |f| |g-h| dm \\ \leq& \int |g-h| dm = 2 |\nu - \mu| \end{align} for any $f$ with $\|f\|_\infty \leq 1$.

Here is a direct proof based on your idea to use the Hahn decomposition that doesn't use Scheffe's lemma. Let $A^+ = \{f \geq 0\}$ and $A^- = \{f < 0\}$. Further, set $\lambda = \nu - \mu$.

We can decompose $\bigg |\int f d\lambda\bigg|$ as \begin{align} \bigg | \int_{E^+ \cap A^+} f d\lambda + \int_{E^+ \cap A^-} f d\lambda + \int_{E^- \cap A^+} f d\lambda + \int_{E^- \cap A^-} f d\lambda \bigg | \end{align}

The advantage of this decomposition is that we know the signs of each of the terms and so, splitting terms into groups based on their sign, we get \begin{align} \bigg |\int f d\lambda\bigg| \leq& \bigg | \int_{E^+ \cap A^+} f d\lambda + \int_{E^- \cap A^-} f d\lambda \bigg | \\ +& \bigg |\int_{E^+ \cap A^-} f d\lambda + \int_{E^- \cap A^+} f d\lambda \bigg| \\ = &\int_{E^+ \cap A^+} f d\lambda + \int_{E^- \cap A^-} f d\lambda \\- & \bigg(\int_{E^+ \cap A^-} f d\lambda + \int_{E^- \cap A^+} f d\lambda \bigg) \end{align} Now, the worst case for bounding each of these terms occurs when $f$ is $1$ or $-1$ depending on the set we integrate over. For example, $$\int_{E^+ \cap A^+} f d\lambda \leq \int_{E^+ \cap A^+} 1 d\lambda = \lambda(E^+ \cap A^+)$$ Similarly, \begin{align} \int_{E^- \cap A^-} f d\lambda \leq &\int_{E^- \cap A^-} -1 d\lambda \leq -\lambda(E^- \cap A^-) \\ \int_{E^+ \cap A^-} f d\lambda \geq& \int_{E^+ \cap A^-} -1 d\lambda = -\lambda(E^+ \cap A^-) \\ \int_{E^- \cap A^+} f d\lambda \geq & \int_{E^- \cap A^+} 1 d\lambda = \lambda(E^- \cap A^+) \end{align} Plugging all of these bounds in and using additivity of the measures to combine terms and get rid of the $A$s, we get that $$\bigg|\int f d\lambda \bigg| \leq \lambda(E^+) -\lambda(E^-) \leq 2 |\nu - \mu|$$ as desired.

For completeness, what follows is a proof of the other inequality with the factor of $\frac{1}{2}$ present based on the idea of Hahn-decomposition. Note that for an arbitrary measurable set $A$, $|\lambda(A)| = |\lambda(A^c)|$ since $\lambda(\Omega) = 0$.

Hence $|\lambda(A)| = \frac12 (|\lambda(A)| + |\lambda(A^c)|)$. We can then write \begin{align} |\lambda(A)| \leq& \frac12 [ \lambda(A \cap E^+) - \lambda(A \cap E^-) + \lambda(A^c \cap E^+) - \lambda(A^c \cap E^-)] \\=& \frac12 (\lambda(E^+) - \lambda(E^-)) \\=& \frac12 \bigg|\int 1_{E^+} - 1_{E^-} d \lambda \bigg| \\ \leq& \frac12 \sup_f \bigg |\int f d\lambda \bigg | \end{align} which proves the other inequality.

Best Answer

Related Solutions

Measure Theory – Total Variation Distance is Complete

Probability Theory – Definition of Total Variation Distance: $ V(P,Q) = \frac{1}{2} \int |p-q|d\nu$

Related Question