[Math] Empirical estimator for total variation distance between two product distributions

approximation-theorymeasure-theorypr.probabilityprobability distributionsst.statistics

Let $X = (X_1, X_2, \ldots , X_n)$ be an $n$-dimensional random variable, where each $X_i$ is a random variable on finite discrete set $S$. In addition, $X_i$ are independent of each other (but not identically distributed). That is, if we denote $X \sim \mathcal{D}$ and $X_i \sim \mathcal{D}_i$, then $\mathcal{D}$ is just the product of $\{ \mathcal{D}_i \}$.

Similarly, let $Y = (Y_1, Y_2, \ldots , Y_n) \sim \mathcal{D}'$ and $Y_i \sim \mathcal{D}'_i$ independently on $S$ (the same $S$ as $X_i$'s). Then $\mathcal{D}'$ is the product of $\{ \mathcal{D}'_i \}$. $X$ and $Y$ are independent.

Now I don't know $\mathcal{D}$ and $\mathcal{D}'$ precisely but I can sample from them (so that I can sample from all the marginal distributions $\mathcal{D}_i$ and $\mathcal{D}'_i$ as well). I would like to estimate the total variation between $\mathcal{D}$ and $\mathcal{D}'$.

Let $d = \|\mathcal{D} – \mathcal{D}'\|_{TV}$ and $\hat{d}$ be our estimator for $d$. Let $N$ be the number of samples we need. I hope to get an error-confidence bound of the form
$$\Pr[ |\hat{d} – d| \ge \epsilon] \le \delta$$
where $\delta$ is at least polynomially small w.r.t. $n, |S|, N$ and $\epsilon$.


I just found a similar question. However, the sample space of $X$ and $Y$ here is $S^n$, which is exponentially large, making the bound in that post not applicable to this question. Besides, here we have $\mathcal{D}$ and $\mathcal{D}'$ being product distributions, which could probably make things easier.

Thank you.

Best Answer

I have something that may or may not be useful...

Diaconis notes an interpretation of variation distance of Paul Switzer. Consider $\mu$, $\nu\in M_p(S)$. Given a single observation of $S$, sampled from $\mu$ or $\nu$ with probability $1/2$, guess whether the observation, $o$, was sampled from $\mu$ or $\nu$. The classical strategy presented here gives the probability of being correct as $1/2(1+\|\mu-\nu\|)$:

  1. Evaluate $\mu(o)$ and $\nu(o)$.
  2. If $\mu(o)\geq\nu(o)$, choose $\mu$.
  3. If $\nu(o)>\mu(o)$, choose $\nu$.

To see this is true, let $\{\mu>\nu\}$ be the set $\{t\in S:\mu(t)>\nu(t)\}$. Suppose $o$ is sampled from $\mu$. Then the strategy is correct if $o\in\{\mu=\nu\}$ or $o\in\{\mu>\nu\}$: $$\mathbb{P}[\text{guessing correctly}\,|\,\mu]=\mathbb{P}[o\in\{\mu=\nu\}\,|\,\mu]+\mathbb{P}[o\in\{\mu>\nu\}\,|\,\mu]$$ with a similar expression for $\mathbb{P}[\text{guessing correctly}\,|\,\nu]$.

Note that $\mathbb{P}[o\in\{\mu=\nu\}]=\mu(\{\mu=\nu\})=\nu(\{\mu=\nu\})$ and also $\mathbb{P}[o\in\{\mu>\nu\}\,|\,\mu]=\mu(\{\mu>\nu\})$ (and similar for $o\in\{\mu<\nu\}$). Thus \begin{align*} \mathbb{P}[\text{guessing correctly}] &=\frac12\mathbb{P}[\text{guessing correctly}\,|\,\mu]+\frac12\mathbb{P}[\text{guessing correctly}\,|\,\nu] \\&=\frac12\left(\nu(\{\mu=\nu\})+\mu(\{\mu>\nu\})\right)+\frac12\left(\nu(\{\mu<\nu\})\right) \end{align*} It is easily shown that $$\|\mu-\nu\|=\mu\left(\{\mu>\nu\}\right)-\nu\left(\{\mu>\nu\}\right).$$ Hence $$ \mathbb{P}[\text{guessing correctly}]=\frac12\left(\underbrace{\nu(\{\mu=\nu\})+\nu(\{\mu>\nu\})+\nu(\{\mu<\nu\})}_{=1}+\|\mu-\nu\|)\right).$$

Related Question