Why does the number of possible probability distributions have the cardinality of the continuum

cardinalsmeasure-theoryprobability theorystatistical-inferencestatistics

Wikipedia's article on parametric statistical models (https://en.wikipedia.org/wiki/Parametric_model) mentions that you could parameterize all probability distributions with a one-dimensional real parameter, since the set of all probability measures & $\mathbb{R}$ share the same cardinality.

This fact is mentioned in the cited text (Bickel et al, Efficient and Adaptive Estimation for Semiparametric Models), but not proved or elaborated on.

This is pretty neat to me. (If I'd been forced to guess, I would have guessed the set of possible probability distributions to be bigger, since pdfs are functions $\mathbb{R}\rightarrow\mathbb{R}$, and we're counting probability distributions that don't have a density, too. It's got to be countable additivity constraining the number of possible distributions, but how?)

Where could I go to find a proof of this, or is it straightforward enough to outline in an answer here? Does its proof depend on AC or the continuum hypothesis? We need some kind of condition on the cardinality of the sample space that neither Wikipedia or Bickel mention, right (if it's too big, then the number of degenerate probability distributions is too big)?

Best Answer

A probability on $\mathbb{R}$, be it continuous or not, is given by its CDF $x \mapsto\mathbb{P}(X \leq x)$. A CDF is right-continuous, and the set of right-continuous functions has the cardinality of $\mathbb{R}$. To see this, you can for instance argue that the values of such a function are given by its values at the rational points, so it has at most the cardinality of a countable product of copies of $\mathbb{R}$, which has the cardinality of $\mathbb{R}$ as well.

Related Question