Empirical Estimator for Total Variation Distance on Finite Spaces

measure-theorypr.probabilityst.statisticsstochastic-processes

I have two probability measures $p$ and $p'$ on a finite set $X$ which I do not know precisely, but which I can sample from. I would like to estimate their total variation (omitting multiplier $2$):
$$
\gamma := \|p – p'\| = \sum_{x\in X}|p(x) – p'(x)|.
$$
Similarly to this paper I though it's naturally to draw independently $\xi_k$ from $p$ and $\xi'_k$ from $p'$ to define:
$$
p_n(\cdot):= \frac1n \sum_{k=1}^n1\{x_k\in \cdot\},\qquad p'_n(\cdot):= \frac1n \sum_{k=1}^n1\{x'_k\in \cdot\}
$$
and declare that $\gamma_n:=\|p_n – p'_n\|$ is an estimator of $\gamma$.

Since $S$ is a finite set, the total variation distance coincides with the Wasserstein 1-distance for the discrete metric, and hence with the corresponding Kantorovich distance. Thus, if I'm not mistaken, from Proposition 3.2 here it follows that $\gamma_n\stackrel{a.s.}{\longrightarrow}\gamma.$ I wonder, however, whether it is possible to come up with bounds on the rate of convergence of the form
$$
\mathbb P(|\gamma_n-\gamma|\geq\delta)\leq\varepsilon \tag{1}.
$$
If $\gamma_n$ would be an unbiased estimator of $\gamma$, that is $\mathbb E\gamma_n = \gamma$, it would be possible to apply Hoeffding's inequality to obtain $(1)$, however $\gamma_n$ does not seem to be an unbiased estimator. I hope to show that
$$
\lim_n\mathbb E\gamma_n = \gamma
$$
which would allow to find $n$ big enough so that $|\mathbb E\gamma_n – \gamma|\leq\frac12\delta$ and then apply Hoeffding's inequality to $\gamma_n$.
I would be happy to hear other ideas. Perhaps, this topic has been already explored.

Best Answer

Since your state space is finite, you will have that $\|p_n-p\|\to 0$ and $\|p_n'-p'\|\to 0$ at exponential rate of decay of probability (simply from finite alphabet large deviations - for example, use section 2.1 in Dembo-Zeitouni's large deviations book). That is, $P(\|p_n-p\|>\delta)\leq n^{|S|} e^{-n I(\delta)}$ where $I$ also depends on the size of the sample space. This immediately gives you (1) with $\epsilon$ decaying exponentially in $n$ (though not uniformly in $|S|$).

Related Question