Solved – Motivation for Kolmogorov distance between distributions

distributionshypothesis testingmathematical-statisticsprobability

There are many ways to measure how similar two probability distributions are. Among methods which are popular (in different circles) are:

  1. the Kolmogorov distance: the sup-distance between the distribution functions;

  2. the Kantorovich-Rubinstein distance: the maximum difference between the expectations w.r.t. the two distributions of functions with Lipschitz constant $1$, which also turns out to be the $L^1$ distance between the distribution functions;

  3. the bounded-Lipschitz distance: like the K-R distance but the functions are also required to have absolute value at most $1$.

These have different advantages and disadvantages. Only convergence in the sense of 3. actually corresponds precisely to convergence in distribution; convergence in the sense of 1. or 2. is slightly stronger in general. (In particular, if $X_n=\frac{1}{n}$ with probability $1$, then $X_n$ converges to $0$ in distribution, but not in the Kolmogorov distance. However, if the limit distribution is continuous then this pathology doesn't occur.)

From the perspective of elementary probability or measure theory, 1. is very natural because it compares the probabilities of being in some set. A more sophisticated probabilistic perspective, on the other hand, tends to focus more on expectations than probabilities. Also, from the perspective of functional analysis, distances like 2. or 3. based on duality with some function space are very appealing, because there is a large set of mathematical tools for working with such things.

However, my impression (correct me if I'm wrong!) is that in statistics, the Kolmogorov distance is the usually preferred way of measuring similarity of distributions. I can guess one reason: if one of the distributions is discrete with finite support — in particular, if it is the distribution of some real-world data — then the Kolmogorov distance to a model distribution is easy to compute. (The K-R distance would be slightly harder to compute, and the B-L distance would probably be impossible in practical terms.)

So my question (finally) is, are there other reasons, either practical or theoretical, to favor the Kolmogorov distance (or some other distance) for statistical purposes?

Best Answer

Mark,

the main reason of which I am aware for the use of K-S is because it arises naturally from Glivenko-Cantelli theorems in univariate empirical processes. The one reference I'd recommend is A.W.van der Vaart "Asymptotic Statistics", ch. 19. A more advanced monograph is "Weak Convergence and Empirical Processes" by Wellner and van der Vaart.

I'd add two quick notes:

  1. another measure of distance commonly used in univariate distributions is the Cramer-von Mises distance, which is an L^2 distance;
  2. in general vector spaces different distances are employed; the space of interest in many papers is polish. A very good introduction is Billingsley's "Convergence of Probability Measures".

I apologize if I can't be more specific. I hope this helps.

Related Question