Total variation distance: a relationship between a Polish space $(\mathcal{X}, d)$ and a measurable space $\left(\mathcal{X},\mathcal{A}\right)$

measure-theorymetric-spacespolish-spacesprobability theorytotal-variation

Introduction (part 1). In the following excerpts of Villani (2008) Optimal transport, old and new, Villani (i) defines the Wasserstein distance among two probability measures $\mu$ and $\nu$ on a ${\color{red}{\textbf{Polish space}}}$ $(\mathcal{X}, d)$ where $d$ is/should be a metric/distance and (ii) shows a relationship between the Wasserstein distance and the total variation distance. Therefore, for both the Wasserstein distance and the total variation distance, the two probability measures $\mu$ and $\nu$ are on a ${\color{red}{\textbf{Polish space}}}$ $(\mathcal{X}, d)$:

Page 10:

All measures considered in the text are Borel measures on ${\color{red}{\textbf{Polish space}}}$, which are complete, separable metric spaces, equipped with
their Borel $\sigma$-algebra.

Page 105:

Definition 6.1 (Wasserstein distances).

Let $(\mathcal{X}, d)$ be a ${\color{red}{\textbf{Polish metric space}}}$, and let $p \in
> [1, \infty)$
. For any two probability measures $\mu$,$\nu$ on
$\mathcal{X}$, the Wasserstein distance of order $p$ between $\mu$
and $\nu$ is defined by the formula …

Page 115:

Theorem 6.15 (Wasserstein distance is controlled by weighted total
variation).

Let $\mu$ and $\nu$ be two probability measures on a ${\color{red}{\textbf{Polish space}}}$
$(\mathcal{X}, d)$. Let $p \in [1, \infty)$ and $x_0 \in \mathcal{X}$. Then …

Page 115:

Particular Case 6.16.

In the case $p=1$, if the diameter of $\mathcal{X}$ is
bounded by $D$, this bound implies $W_1(\mu,\nu) \leq D \lVert \mu – \nu \rVert_{TV} $ (Limone's note: $\lVert \mu – \nu \rVert_{TV}$ is/should be the total variation distance)

Introduction (part 2). In the following excerpts of Tsybakov (2009) Introduction to nonparametric estimation, Tsybakov defines the total variation distance among two probability measures $P$ and $Q$ on a ${\color{blue}{\textbf{measurable space}}}$ $\left(\mathcal{X},\mathcal{A}\right)$ (with "The sample space $\mathcal{X}$, the $\sigma-$algebra $\mathcal{A}$", on page 121):

Page 83:

2.4 Distances between probability measures

Let $\left(\mathcal{X},\mathcal{A}\right)$ be a ${\color{blue}{\textbf{measurable space}}}$ and let $P$ and $Q$ be two probability measures on $\left(\mathcal{X},\mathcal{A}\right)$.
Suppose that $\nu$ is a $\sigma-$finite measure on $\left(\mathcal{X},\mathcal{A}\right)$ satisfying $P\ll\nu$ and $Q\ll\nu$.
Define $p = \frac{dP}{d\nu}$, $q = \frac{dQ}{d\nu}$. Observe that such a measure $\nu$ always exists since we can take, for example, $\nu = P + Q$.

Page 83:

Definition 2.4

The total variation distance between P and Q is defined as follows: $V(P,Q)=$

Question. Since both Villani and Tsybakov introduce the total variation distance in their books, is the ${\color{red}{\textbf{Polish space}}}$ $(\mathcal{X}, d)$ introduced by Villani the same, or similar, to the ${\color{blue}{\textbf{measurable space}}}$ $\left(\mathcal{X},\mathcal{A}\right)$ introduced by Tsybakov?

Note. My question does not depend on the different notations used by Villani and Tsybakov to indicate the probability measures ($\mu$ and $\nu$, in Villani are , respectively, $P$ and $Q$ in Tsybakov)!

Best Answer

In stating his definitions, Villani always assumes that all measures are defined on the Borel $\sigma$-algebra of a Polish space. The topology and/or distance do not play any role in the definition of the total variation, since the latter only depends on the measurable structure.

Thus, Villani's definition is simply a particular case of other standard definitions in the literature. It is particular only in that he assumes the $\sigma$-algebra $\mathcal A$ to be the Borel $\sigma$-algebra of a given Polish topological space $X$, and the measurable space which Villani implicitly uses is the space $(X,\mathcal A)$.

This does not affect the definition of total variation in any other way.