Introduction (part 1). In the following excerpts of Villani (2008) Optimal transport, old and new, Villani (i) defines the Wasserstein distance among two probability measures $\mu$ and $\nu$ on a ${\color{red}{\textbf{Polish space}}}$ $(\mathcal{X}, d)$ where $d$ is/should be a metric/distance and (ii) shows a relationship between the Wasserstein distance and the total variation distance. Therefore, for both the Wasserstein distance and the total variation distance, the two probability measures $\mu$ and $\nu$ are on a ${\color{red}{\textbf{Polish space}}}$ $(\mathcal{X}, d)$:
Page 10:
All measures considered in the text are Borel measures on ${\color{red}{\textbf{Polish space}}}$, which are complete, separable metric spaces, equipped with
their Borel $\sigma$-algebra.
Page 105:
Definition 6.1 (Wasserstein distances).
Let $(\mathcal{X}, d)$ be a ${\color{red}{\textbf{Polish metric space}}}$, and let $p \in
> [1, \infty)$. For any two probability measures $\mu$,$\nu$ on
$\mathcal{X}$, the Wasserstein distance of order $p$ between $\mu$
and $\nu$ is defined by the formula …
Page 115:
Theorem 6.15 (Wasserstein distance is controlled by weighted total
variation).Let $\mu$ and $\nu$ be two probability measures on a ${\color{red}{\textbf{Polish space}}}$
$(\mathcal{X}, d)$. Let $p \in [1, \infty)$ and $x_0 \in \mathcal{X}$. Then …
Page 115:
Particular Case 6.16.
In the case $p=1$, if the diameter of $\mathcal{X}$ is
bounded by $D$, this bound implies $W_1(\mu,\nu) \leq D \lVert \mu – \nu \rVert_{TV} $ (Limone's note: $\lVert \mu – \nu \rVert_{TV}$ is/should be the total variation distance)
Introduction (part 2). In the following excerpts of Tsybakov (2009) Introduction to nonparametric estimation, Tsybakov defines the total variation distance among two probability measures $P$ and $Q$ on a ${\color{blue}{\textbf{measurable space}}}$ $\left(\mathcal{X},\mathcal{A}\right)$ (with "The sample space $\mathcal{X}$, the $\sigma-$algebra $\mathcal{A}$", on page 121):
Page 83:
2.4 Distances between probability measures
Let $\left(\mathcal{X},\mathcal{A}\right)$ be a ${\color{blue}{\textbf{measurable space}}}$ and let $P$ and $Q$ be two probability measures on $\left(\mathcal{X},\mathcal{A}\right)$.
Suppose that $\nu$ is a $\sigma-$finite measure on $\left(\mathcal{X},\mathcal{A}\right)$ satisfying $P\ll\nu$ and $Q\ll\nu$.
Define $p = \frac{dP}{d\nu}$, $q = \frac{dQ}{d\nu}$. Observe that such a measure $\nu$ always exists since we can take, for example, $\nu = P + Q$.
Page 83:
Definition 2.4
The total variation distance between P and Q is defined as follows: $V(P,Q)=$…
Question. Since both Villani and Tsybakov introduce the total variation distance in their books, is the ${\color{red}{\textbf{Polish space}}}$ $(\mathcal{X}, d)$ introduced by Villani the same, or similar, to the ${\color{blue}{\textbf{measurable space}}}$ $\left(\mathcal{X},\mathcal{A}\right)$ introduced by Tsybakov?
Note. My question does not depend on the different notations used by Villani and Tsybakov to indicate the probability measures ($\mu$ and $\nu$, in Villani are , respectively, $P$ and $Q$ in Tsybakov)!
Best Answer
In stating his definitions, Villani always assumes that all measures are defined on the Borel $\sigma$-algebra of a Polish space. The topology and/or distance do not play any role in the definition of the total variation, since the latter only depends on the measurable structure.
Thus, Villani's definition is simply a particular case of other standard definitions in the literature. It is particular only in that he assumes the $\sigma$-algebra $\mathcal A$ to be the Borel $\sigma$-algebra of a given Polish topological space $X$, and the measurable space which Villani implicitly uses is the space $(X,\mathcal A)$.
This does not affect the definition of total variation in any other way.