Probability Theory – Definition of Total Variation Distance: $ V(P,Q) = \frac{1}{2} \int |p-q|d\nu$

measure-theoryprobabilityprobability theory

let $P,Q$ be two probability measures on $(\Omega, \mathscr {F})$, and let $\nu$ be a $\sigma$-finite measure on the same event space such that $P \ll v, Q \ll v$. Define $\frac{dP}{dv}=p$, $\frac{dQ}{dv}=q$.

The total variation distance between P and Q is then:
$$
V(P,Q) = \sup_{A \in \mathscr{F}}|P(A) – Q(A)|= \sup_A \bigg| \int_A(p-q )d\nu \bigg|
$$
I'm confused about the following, we may write:
$$
V(P,Q) = \frac{1}{2} \int |p-q|d\nu
$$
First, how can we bring the absolute value inside the integral and get rid of the supremum, and second, what region are we integrating over now? This statement is made frequently in the book: Introduction to nonparametric Estimation – Tsybakov

Best Answer

Let $B = \{p \ge q\}$. Note that \begin{align*} \int_\Omega \def\abs#1{\left|#1\right|}\abs{p-q}\, d\nu &= \int_B (p - q) \, d\nu + \int_{\Omega \setminus B} (q- p)\, d\nu\\ &\le 2 \sup_A \abs{\int_A (p-q) \, d\nu} \end{align*} On the other side, note first that $$ \int_\Omega (p-q) \,d\nu = P(\Omega) - Q(\Omega) = 0 $$ and hence $$ \int_B (p-q) \, d\nu = \int_{\Omega \setminus B} (q-p) \, d\nu $$ Now for any $A \in \mathscr F$, we have \begin{align*} \abs{\int_A (p-q)\, d\nu} &= \max\left\{\int_A (p-q)\, d\nu, \int_A (q-p)\, d\nu\right\}\\ &\le\max\left\{ \int_{A\cap B} (p-q)\, d\nu, \int_{A \cap (\Omega \setminus B)} (q-p)\, d\nu\right\}\\ &\le \max\left\{ \int_{B} (p-q)\, d\nu, \int_{\Omega \setminus B} (q-p)\, d\nu\right\}\\ &= \int_B (p-q)\, d\nu\\ &= \frac 12 \int_\Omega \abs{p-q}\,d\nu \end{align*} Taking the supremum over $A \in \mathscr F$, gives $$ \sup_A \abs{\int_A (p-q)\, d\nu} \le \frac 12 \int_\Omega \abs{p-q}\, d\nu $$ which is the other needed inequality.

Best Answer

Related Solutions

[Math] $\frac 1 2$ in the definition of total variation distance between two probability measures

[Math] Comparing the Kullback-Leibler divergence to the total variation distance on discrete probability densities.

Related Question