[Math] $\frac 1 2$ in the definition of total variation distance between two probability measures

From Wikipedia

In probability theory, the total variation distance between two
probability measures $P$ and $Q$ on a sigma-algebra $F$ is $$
\sup\left\{\,\left|P(A)-Q(A)\right| : A\in F\,\right\}. $$ Informally, this is the largest possible difference between the
probabilities that the two probability distributions can assign to the
same event.

For a finite alphabet we can write $$
\delta(P,Q) = \frac 1 2 \sum_x \left| P(x) – Q(x) \right|\;. $$ Sometimes the statistical distance between two probability
distributions is also defined without the division by two.

I was wondering if there is some particular consideration when having that $\frac 1 2$ for the finite case, while not in the general case?
My understanding of this total variation distance/metric is that it is induced from upper variation of the whole set(which is a norm if I am correct). From there, I can't see the need of dividing by 2.

Also in the finite case, why not define similarly in terms of $\sup$ over $A \in F$?

Thanks and regards!

Best Answer

It is not a matter of adding a factor of $\frac{1}{2}$ in the finite case. The second expression is a sum over all elements of the underlying set, while the first expression is not a sum, but a sup over all events in the space. The reason for the $\frac{1}{2}$ in the second expression is that it can be proved that in the finite case, the two quantities are equal. See for example Proposition 4.2 on page 48 of Markov chains and mixing times by Levin, Peres, and Wilmer. I do not know the full extent of analogies to the second expression for cases when the underlying set is infinite, but the sum would have to become an integral. See cardinal's comments for more information.

Best Answer

Related Solutions

[Math] Comparing the Kullback-Leibler divergence to the total variation distance on discrete probability densities.

Probability Theory – Total Variation Distance and L1 Norm

Related Question