[Math] Average distance between $n$ randomly distributed points on a square with their nearest neighbors

calculuseuclidean-geometrygeometryuniform distribution

Given a square with side length of $x$ and $n$ randomly distributed points on it with uniform distribution, what is the average of the length between the nodes and their nearest neighbors?

I'd be more than happy if you can also help with the generalized version of the question above:

Given the distance between two points $i$ and $j$ as $r_{ij}$ and the cost between them as $c_{ij} = r_{ij}^m$ with $m \geq 2$, what is the average cost between the nodes and their nearest neighbors?

Best Answer

It is not that hard to write down an integral representation for the expected value of average nearest neighbor distance. However, computing that integral is a nightmare. Fortunately, extracting the leading large $n$ behavior is easy.

For simplicity, we will only consider the case where side length $x$ is $1$.

Let

  • $\Delta = [0,1]^2$ be the unit square.
  • $\bar{B}(v,r)$ be the closed disc centered at $v$ with radius $r$ and $A(v,r) = \verb/Area/(\bar{B}(v,r) \cap \Delta)$.
  • $v_1, \ldots, v_n$ be $n$ iid points sampled uniformly from $\Delta$.
  • $r_i = \min_{j\ne i}|v_i - v_j|$ be the distance between $v_i$ and its nearest neighbor.
  • $\lambda$ be the expected value of average nearest neighbor distances among $v_1, \ldots v_n$.

Since $v_1, \ldots, v_n$ are iid, we have $\displaystyle\;\lambda = \Lambda_1 \stackrel{def}{=} \mathbf{E}\left[ \frac1n \sum_i r_i\right] = \mathbf{E}\left[ r_1 \right] $.

For any $r \in \mathbb{R}$, the condition $r_1 > r$ is equivalent to $v_j \notin \bar{B}(v_1,r)$ for all $j \ne 1$.
When we fix the location of $v_1$, the CDF of its distance to its nearest neighbour equals to

$$\begin{align} {\rm CDF}(v_1, r) & \stackrel{def}{=} \mathbf{P}\left[ r_1 \le r : v_1\right] = 1- \mathbf{P}\left[ r_1 > r : v_1\right]\\ &= 1 - \mathbf{P}\left[ v_2,\ldots, v_n \notin \bar{B}(v_1, r) : v_1 \right] = 1 - \mathbf{P}\left[ v_2 \notin \bar{B}(v_1, r) : v_1 \right]^{n-1}\\ &= 1 - (1 - A(v_1,r))^{n-1} \end{align} $$ We obtain following integral representation of $\lambda$:

$$\lambda = \mathbf{E}_{v_1}\left[ \mathbf{E}\left[ r_1 : v_1 \right] \right] = \mathbf{E}_{v_1}\left[\int_0^\infty r d{\rm CDF}(v_1,r) \right] = \int_{\Delta}\int_0^\infty (1-A(v,r))^{n-1} dr dv $$ The real problem is $A(v,r)$ is very complicated. Evaluate the integral exactly even for small $n$ is a nightmare. For the leading large $n$ behavior of $\lambda$, we have a much better luck.

When $n$ is large, $(1 - A(v,r))^{n-1}$ falls of very quickly as $r$ increases from $0$. For those $r$ where $(1 - A(v,r))^{n-1}$ contributes to the integral, we can replace $(1 - A(v,r))^{n-1}$ by $e^{-(n-1)A(v,r)}$ ${}^{\color{blue}{[1]}}$.

Unless $v_1$ is within a distance of $O(\lambda)$ from the boundary of $\Delta$, $v_1$ will be closer to its nearest neighbor than the boundary for those $r$ where $(1-A(r))^{n-1}$ matters. For such $r$, we can replace $A(v,r)$ by $\pi r^2$. Up to relative correction of $O(\lambda)$ ${}^{\color{blue}{[2],[3]}}$, we have

$$\lambda = \left(\int_0^\infty e ^{-(n-1)\pi r^2} dr \right) \times ( 1 + O(\lambda)) = \frac{1}{2\sqrt{n-1}}(1 + O(\lambda))$$ As a result, for large $n$, we obtain following leading behavior of $\lambda$ $$\lambda = \frac{1}{2\sqrt{n-1}} + O\left(n^{-1}\right)$$

For the general case where the cost associated to $v_i$ is $r_i^m$, the integral representation becomes

$$\Lambda_m \stackrel{def}{=} \mathbf{E}\left[\frac1n\sum_i r_i^m\right] = \mathbf{E}[r_i^m] = m \int_\Delta\int_0^\infty r^{m-1}(1-A(v,r))^{n-1} dr dv$$

with leading large $n$-behavior of the form

$$\Lambda_m = \Gamma\left(\frac{m}{2}+1\right)\left(\frac{2\lambda}{\sqrt{\pi}}\right)^m+ O(\lambda^{m+1})$$

Notes

  • $\color{blue}{[1]}$ - Using Waston's lemma, the relative error due to the replacement $(1-A(v,r))^{n-1}$ by $e^{-(n-a)A(v,r)}$ is $O(n^{-1}) = O(\lambda^2)$. We can safely ignore this error for the next to leading large $n$ dependence of $\lambda$.

  • $\color{blue}{[2]}$ - When $v$ is near one of the side or near a corner, $A(v,r) \ge \frac12 \pi r^2$ and $\ge \frac14 \pi r^2$ for small $r$. This will increase the integral $\int_0^\infty (1-A(v,r))^{n-1} dr$ by a factor at most $\sqrt{2}$ and $2$ respectively. Let $\lambda_0 = \frac{1}{2\sqrt{n-1}}$ be the leading $n$ dependence of $\lambda = \Lambda_1$. The next to leading dependence $\lambda_1$ will be dominated by those $v$ within an area of order $O(4\lambda_0)$ near the boundary.

  • $\color{blue}{[3]}$ - If I didn't make any mistake, the next to leading dependence $\lambda_1$ should equal to $$4\int_0^\infty \int_0^r \left(e^{-\frac{(\pi - \theta + \sin\theta\cos\theta)r^2}{4\lambda_0^2}} - e^{-\frac{\pi r^2}{4\lambda_0^2}} \right) d\ell dr$$ where $\ell$ is the distance between the center and the side and $\theta = \cos^{-1}\left(\frac{\ell}{r}\right)$. Changing variable from $(r,\ell)$ to $(r,\theta)$ and integrate over $r$ first, this can be simplified to $$\lambda_1 = 4\lambda_0^2 \left[\int_0^{\pi/2}\frac{2\sin\theta d\theta}{\pi - \theta + \sin\theta\cos\theta} - \frac{2}{\pi}\right] \approx 0.821893491\lambda_0^2 $$