[Math] how far the distribution from the uniform distribution

distribution-theorydivergence-operatorprobability distributionsuniform distribution

I have two discrete probability distributions $P$ and $Q$, where $P=(p_1,…,p_n)$ and $Q=(q_1,…,q_n)$, in addition I have uniform distribution $U=(\frac{1}{n},…,\frac{1}{n})$.

The question is how to measure which distribution $P$ or $Q$ is the closest to uniform distribution.

I am not sure if I can use Kullback–Leibler divergence because is not a "correct" distance. Also, I don't know if I can use entropy.

Best Answer

  • Total variation distance, also known as statistical distance, is a good metric (very stringent). (Note that up to a factor $2$, it's equivalent to $\ell_1$ distance between the vectors of probabilities.) It also has a nice interpretation in terms of closeness of events.

  • $\ell_2$ will be much more forgiving towards small differences, and put the emphasis on outliers.

  • Hellinger also has some nice properties, and interpretation (although is maybe less commonly used).

  • Kolmogorov distance (equivalently., the $\ell_\infty$ distance between CDFs) will make sense if your domain $\{1,\dots,n\}$ has a meaningful order on it.

All of these (and more, e.g. Wasserstein/Earthmover) are valid choices -- ultimately, it'll depend on your application.

A good resource: Distances and affinities between measures, Chapter 3 of Asymptopia by Pollard. "On choosing and bounding probability metrics" by Gibbs and Su is also a recommended read.