[Math] Convergence of an empirical distribution w.r.t. the Hellinger distance

it.information-theorypr.probabilityst.statistics

Let $P$ be a probability distribution on a finite set $\mathcal{X}$ and let $X_1, X_2, \ldots, X_n$ be drawn i.i.d. according to $P$. Define the empirical distribution:

$\hat{P_n}(x) = \frac{1}{n} \sum_{i=1}^{n} 1_{X_i = x}$

Let $d_H(P,Q)$ be the Hellinger distance:

$d_H(P,Q) = \left( \frac{1}{2} \sum_{x \in \mathcal{X}} ( \sqrt{P(x)} – \sqrt{Q(x)} )^2 \right)^{1/2}$

Is there a nice expression for the expected distance between $\hat{P_n}$ and $P$? That is, is there some formula like

$\mathbb{E}[ d_H(P,Q) ] = C \frac{1}{n} – O(\frac{1}{n^2})$

where $C$ can be written out explicitly? Or if the rate of convergence is slower than $1/n$, can we get the exact rate of convergence?

For context, if we consider the KL-divergence or $L_1$ distance then we can get explicit expressions for the first term in the rate of convergence of $\hat{P_n}$ to $P$. Can we do the same for the Hellinger distance?

It would be interesting to know this for densities as well, but maybe the discrete problem is easier.

Best Answer

it is possible to show that $\mathrm {E}d(P,\hat{P_n})\sim \frac{C}{\sqrt{n}}$ and specify the value of $C$.

let

$$D_n^2 =\sum_{x \in \mathcal{X}} \left( \sqrt{P(x)} - \sqrt{\hat{P_n}(x)} \right)^2 = 2d^2(P,\hat{P_n}). $$

$4nD_n^2$ is known in statistics [for reasons unclear to me] as the freeman-tukey goodness-of-fit [gof] statistic for testing the null hypothesis that $X\sim P$. like the better known pearson chi-squared gof statistic, it also has [under the null hypothesis] an asymptotic chi-squared distribution with $k-1$ df. here $k=|\mathcal{X}|$.

the statistic $D_n^2$ seems to have been first considered by Matusita in On the Estimation by the Minimum Distance Method. in Decision Rules, Based on the Distance, for Problems of Fit, Two Samples, and Estimation, Matusita develops some asymptotic [and other] properties of $D_n^2$, including the fact that under the null hypothesis, as $n\to\infty$,

$$\kern-1.9in (1)\kern1.9in 4nD_n^2\ \buildrel{\mathcal L}\over{\to}\ \chi^2_{k-1}.$$

it is also shown there that

$$\kern-.88in (2)\kern.88in 4nD_n^2\ \le\ \mathbb{X}^2_n\ :=\ n\sum_{x \in \mathcal{X}} \frac{\left({\hat P}(x)-P(x)\right)^2}{P(x)}. $$

$\mathbb{X}^2_n$ is, of course, the pearson chi-squared gof statistic, and it is well-known that under the null hypothesis $X\sim P$, as $n\to\infty$,

$$\kern-2in (3)\kern2in \mathbb{X}^2_n\ \buildrel{\mathcal L}\over{\to}\ \chi^2_{k-1}.$$

it is also easily seen that for all $n\ge 1,\ \mathrm {E} \mathbb{X}^2_n\ =\ k-1$. together with (3) [and non-negativity], this entails that $\mathbb{X}^2_n$ is uniformly integrable. in view of (2), so is $4nD_n^2$, so it follows from (1) that

$$\mathrm {E}4nD_n^2\to \mathrm {E}\chi^2_{k-1}\ =\ k-1\ \mathrm{as}\ n\to\infty$$

and

$$\mathrm {E}2\sqrt{n}D_n\to \mathrm {E}\chi_{k-1}\ \mathrm{as}\ n\to\infty.$$

[for more details on connections between convergence in law, uniform integrability and convergence of expectations, see Billingsley 1st ed, p32 theorem 5.4 or Billingsley 2nd ed pp31-32 theorems 3.4 and 3.5.]

Related Question