Dispersion of Random Subset in [-1,1]^2

co.combinatoricscomputational geometrygeometric-probabilitypr.probability

Let $C$ be the square $[-1,1]^2$. Let $a_1,\dots,a_m$ be points chosen independently and uniformly at random from $C$. Let $d_m$ (dispersion) be the random variable $\max_{x \in C}{\min_{j \in [m]}{\|x-a_j\|_2}}$. I would like to find a function $\epsilon(m)$ such that $\Pr[d_m \leq \epsilon(m)] \geq $ constant (no dependence on m).

In computational geometry terminology, this means, find an $\epsilon(m)$ such that a "random" $m$-subset of $C$ is an $\epsilon(m)$-net for $(C,\mathcal{B})$ (where $\mathcal{B}$ is the class of all $L_2$-balls in $\mathbb{R}^2$) with constant probability.

In the one-dimensional case (here $C = [-1,1]$), we can take $\epsilon(m)$ to be $K\frac{\log m}{m}$ for a large enough constant $K$. This follows from Theorem 3.1 of "On the lengths of the pieces of a stick broken at random (Holst, 1980)".

I found this similar question on mathoverflow, about the expected value of maximum distance between any two points in $\{a_1,\dots,a_m\}$. I don't see how the bound there can be used to get an $\epsilon(m)$ though.

On page 2 of the paper "Quasi-Monte-Carlo methods and the dispersion of point sequences (Rote-Tichy, 1996)" it says that "If the range space $R$ has finite Vapnik-Chervonenkis dimension $d$ then a random subset of $X$ [to be thought of as $C$ in our question] of size $(d/\epsilon)\log (1/\epsilon)$ is an $\epsilon$-net with high probability." This would be great for us as $\mathcal{B}$ (the class of all $L_2$-balls in $\mathbb{R}^2$) has finite VC dimension ($= d+1$ for the $d$-dimensional case). Rote-Tichy says that this is proved in the paper "Epsilon-nets and simplex range queries (Haussler-Welzl, 1987)". But on this paper, I could only find such theorems for random subsets of a finite $A \subset X$ (Theorem 3.3, Corollary 3.7, etc.).

Best Answer

For every constant dimension $d$, (I.e. when $C = [0,1]^d$), the answer is asymptotically $\varepsilon_d(m) = \Theta\left(\frac{\log m}{m}\right)^{1/d}$. In the $\Theta$ notation I’m hiding constant factors that depend on $d$. This follows from the coupon collectors problem: let us partition a $[0,1]^d$ cube into $\varepsilon^{-d}$ sub-cubes of side-length $\varepsilon$, I.e. $[0,1]^d = \left( \bigcup_i [i \varepsilon, (i + 1) \varepsilon]\right)^d$.

Now, let us draw $m$ points uniformly at random from $[0, 1]^d$, and let’s keep track of which sub-cubes are being hit.

By coupon collectors, as soon as $m \gg \varepsilon^{-d} \log(\varepsilon^{-d})$, with high probability we will hit each sub-cube, and when $m \ll \varepsilon^{-d} \log(\varepsilon^{-d})$ with high probability we will miss at least one sub-cube.

Now it is enough to show the following two elementary geometric fact:

  1. Any subset $S\subset [0, 1]^d$ which misses at least one sub-cube, has dispersion at least $\varepsilon/2$ (take the center of a missed cube as a witness $x$).

  2. Any subset $S \subset [0,1]^d$ which hits every sub-cube has dispersion at most $\sqrt{d}\varepsilon$ (the diameter of a cube of side length $\varepsilon$).

Related: I recommend the great answer by Iosif Pinelis with much more precise calculation of the exact constant for the case where $d=1$.