[Math] Concentration of sum of pairwise squared Euclidean distances of random vectors

pr.probabilityst.statistics

Let $X_1, \ldots, X_n $ be independent random vectors in $B(0, D) \subset R^d$ ($\ell_2$ ball of radius $D$ centered at the origin). I am trying to find the concentration of the following quantity around its expectations:
$$f(X_1, \ldots, X_n) = \frac{1}{n^2} \sum_{1 \leq i,j \leq n} \|X_i – X_j \|_2^2$$
Using McDiarmid's inequality, I can show that
$$ \Pr(|f(X_1,\ldots,X_n) – \mathbb{E} f(X_1,\ldots,X_n)| \geq \epsilon) \leq 2 \exp\left(- \frac{2n\epsilon^2}{16D^4}\right). $$
This tells me that with high probability, $\epsilon \sim O(1/\sqrt{n})$.

Usually sum of $n$ independent random variables concentrate with the similar dependence on $n$ (that is, $O(1/\sqrt{n})$) unless I use Berstein's inequality. And it is believed that in some sense, among functions of independent random variables, sums are the least concentrated.

In my situation, the function is a sum of $n^2$ non-independent random variables (if you think of $Z_{ij} = \| X_i – X_j \|_2^2$ as a random variable). So I believe that the concentration should be better than $O(1/\sqrt{n})$.

Does anybody have any idea of how such a guarantee can be achieved? Or can anybody show that the $O(1/\sqrt{n})$ is the best I can hope for?

Best Answer

You're right, you can get a better concentration inequality with $\epsilon \sim O(1/n)$ considering that's a sum of $n^2$ dependent random variables. You first need to use decoupling for $U$-statistics that relates the dependent setting to the independent one as it is shown in [1]. In your case, you can show that there exist a constant $C$ such that

$ P\left(\left| \frac{1}{n^2} \sum_{1 \leq i \ne j \leq n} \|X_i - X_j \|_2^2 - \mathbb{E}f\right| \ge t\right) \le \\ \qquad \qquad C P\left(C\left|\frac{1}{n^2} \sum_{1 \leq i \ne j \leq n} \|X_i^{(1)} - X_j^{(2)} \|_2^2 - \mathbb{E}f\right| \ge t\right) , $

where $X_i^{(1)}, X_i^{(2)}$ are independent copies of $X_i$. Then you can apply the McDiarmid inequality on the sum of $n(n-1)$ independent terms and get an inequality achieving the rate $O(1/n)$. Moreover, as your U-statistic is symmetric you can reverse this inequality to get a lower bound and convince you that you can't do better.

[1] de la Peña, V. H., & Montgomery-Smith, S. J. (1995). Decoupling inequalities for the tail probabilities of multivariate U-statistics. The Annals of Probability, 806-816. Avialable : http://arxiv.org/pdf/math/9309211v2.pdf

Related Question