Space of probability measures with total variation norm is not separable

general-topologymeasure-theorymetric-spacesprobability theoryreal-analysis

Let $S$ be the space of probability measures on $[0,1]$ and equip it with the distance induced by the total variation norm: for $\mu,\nu\in S$:

$$\|\mu-\nu\|= \sup_{P\in \mathcal P} \sum_{A\in P}|\mu(A)-\nu(A)|$$

where $\mathcal P$ is the space of all finite partitions of $[0,1]$, e.g. $P=\{[0,\frac{1}{3}], [\frac{1}{3}, \frac{5}{8}],[\frac{5}{8}, 1]\}$. We can show that $S$ is not separable: if the probability measures may have atoms, then the space $D$ of all Dirac masses each concentrated on a number in $[0,1]$ is such that all probability measures are at distance $2$ from each other (pairwise) and are uncountable. Hence, $S$ is not separable.

What about the space of all atomless probability measures $S_{\text{AL}}\subset S$? Can we make this uncountable?

Intuition: if we "make" the Dirac masses continuous by e.g. putting a very thin Gaussian bell (with a small standard deviation) around each $a\in\mathbb R$, these Gaussian bells will not be at a fixed distance from each other anymore.

Someone else’s suggestion: consider the Z-order curve $\zeta(\cdot)$, apparently this is a continuous measurable map which is one-to-one from a line to the square. As a result, you can map the Dirac masses continuously to the square inducing a probability distribution that is measurable and continuous. This would give us an uncountable set. I do not understand this line of reasoning, however

Best Answer

Yes, the atomless measures on $[0,1]$ are still non-separable.

To explain this, we need to understand the concept of Hausdorff measure and Hausdorff dimension. For any $d\ge 0$, and any subset $E$ of a metric space, we define the $d$-dimensional Hausdorff measure of $E$ by $$ H^d(E)=\lim_{\delta\to 0}\;\;\inf\left\{\sum_{i=1}^\infty( \text{diam }U_i)^d\;:\text{diam }U_i<\delta,\;E\subseteq \bigcup_{i=1}^\infty U_i\right\} $$ The infimum ranges over all infinite sequences $(U_1,U_2,\dots)$ of subsets in the metric space which cover $E$ and whose maximum diameter is $\delta$.

For the purposes of this question, $E$ is a subset of $[0,1]$, and each $U_i$ is an interval. So, to find the $d$-dimensional measure of $E$, you cover $E$ with tiny intervals, and add up the lengths of those intervals raised to the $d$th power. You can check that $H^1(E)$ is the $1$-dimensional Lebesgue measure of $E$, while $H^0(E)$ is the cardinality of $E$.

It is possible to have $H_d(E)=+\infty$. This happens when $d$ is smaller than the "correct" dimension for $E$. This is analogous to how the "length" of the interior of a square is infinite, because a solid square is inherently two-dimensional, so a linear measure is too small. Every set $E$ has at most one value of $d$ such that $H^d(E)$ is nonzero and finite; this number $d$ is called the Hausdorff dimension of $E$. The uniqueness of this $d$ is implied by the following three-part proposition; all parts are easy to prove directly from the definition of $H^d$.

Proposition: For set $E$, and any real numbers $0\le s<d<b$, ($s$ for small, $b$ for big)

  • $H^{d}(E)=0$ implies $H^{b}(E)=0$.
  • $H^{d}(E)=+\infty$ implies $H^{s}(E)+\infty$.
  • If $H^d(E)$ is nonzero and finite, then $H^{b}(E)=0$ and $H^{s}(E)=+\infty$.

Now, back to the problem at hand. For any $0<\alpha<1$, let $C_\alpha$ be the middle-$\alpha$ Cantor set. You can show that $C_\alpha$ has Hausdorff dimension $d(\alpha)$, where we define $$d(\alpha):=\log_{2/(1-\alpha)} 2.$$ Then, define a measure $\def\P{\mathbb P}\P_\alpha$ on $[0,1]$ by $$ \P_\alpha(E)=\frac{H^{d(\alpha)}(E\cap C_\alpha)}{H^{d(\alpha)}(C_\alpha)} $$ For example, $\P_{1/3}$ is the Cantor distribution. You can, show for any $\alpha$, that $P_\alpha$ is indeed a valid probability distribution. The hard part to prove is countable additivity, which requires something like Carathéodory's extension theorem.

The collection of measures $\{\P_\alpha\mid 0<\alpha<1\}$ all have pairwise total variation distances of two. Indeed, for any $\alpha<\beta$, the partition $\{C_\beta,[0,1]-C_\beta\}$ is a witness for $d_{TV}(\P_\alpha,\P_\beta)$ being two. This follows from the Proposition; since $d(\alpha)$ is larger than $d(\beta)$, we will have $H^{d(\alpha)}(C_\beta)=0$, which implies $\P_\alpha(C_\beta)=0$.