Expectation of Pairwise Differences Between Uniform Random Points in Hypercube

integrationprobability

Say you have 2 iid random variables $x,y\sim U[0,1]^k$, i.e. the uniform distribution over the k-dimensional unit cube.
What's the expected value of the Euclidean distance between them when they have been normalized by the maximum distance possible, i.e. $\sqrt k$?

For $k=1$, I worked out that this is 1/3. For $k=100$, monte carlo simulations tell me it's a little over 0.4.

I tried to work out the math but
$$\frac{1}{\sqrt{k}}\int_{S_x} \int_{S_y} \sqrt{(x-y)^T (x-y)}\;dx\;dy$$
where $S_x,S_y=[0,1]^k$ for general $k$ is beyond me.

Best Answer

The asymptotics is $1/\sqrt6=0.40824829$.

To see this, consider i.i.d. random variables $X_i$ and $Y_i$ uniform on $[0,1]$ and write the quantity to be computed as $I_k=\mathrm E\left(\sqrt{Z_k}\right)$ with $$ Z_k=\frac1k\sum_{i=1}^k(X_i-Y_i)^2. $$ By the strong law of large numbers for i.i.d. bounded random variables, when $k\to\infty$, $Z_k\to z$ almost surely and in every $L^p$, with $z=\mathrm E(Z_1)$. In particular, $I_k\to \sqrt{z}$. Numerically, $$ z=\iint_{[0,1]^2}(x-y)^2\mathrm{d}x\mathrm{d}y=2\int_0^1x^2\mathrm{d}x-2\left(\int_0^1x\mathrm{d}x\right)^2=2\frac13-2\left(\frac12\right)^2=\frac16. $$


Edit (This answers a different question asked by the OP in the comments.)

Consider the maximum of $n\gg1$ independent copies of $kZ_k$ with $k\gg1$ and call $M_{n,k}$ its square root. A heuristics to estimate the typical behaviour of $M_{n,k}$ is as follows.

By the central limit theorem (and in a loose sense), $Z_k\approx z+N\sqrt{v/k}$ where $v$ is the variance of $Z_1$ and $N$ is a standard gaussian random variable. In particular, for every given nonnegative $s$, $$ \mathrm P\left(Z_k\ge z+s\right)\approx\mathrm P\left(N^2\ge ks^2/v\right). $$ Furthermore, the typical size of $M_{n,k}^2$ is $z+s$ where $s$ solves $\mathrm P(Z_k\ge z+s)\approx1/n$. Choose $q(n)$ such that $\mathrm P(N\ge q(n))=1/n$, that is, $q(n)$ is a so-called quantile of the standard gaussian distribution. Then, the typical size of $M_{n,k}^2$ is $k(z+s)$ where $s$ solves $ks^2/v=q(n)^2$. Finally, $$ M_{n,k}\approx \sqrt{kz+q(n)\sqrt{kv}}. $$ Numerically, $z=1/6$, $v=7/180$, and you are interested in $k=1'000$. For $n=10'000$, $q(n)=3.719$ yields a typical size $M_{n,k}\approx13.78$ and for $n=100'000$, $q(n)=4.265$ hence $M_{n,k}\approx13.90$ (these should be compared to the values you observed).

To make rigorous such estimates and to understand why, in a way, $M_{n,k}$ concentrates around the typical value we computed above, see here.

Related Question