[Math] Expected length of the shortest polygonal path connecting random points

asymptoticscombinatorial-geometrygeometryprobabilityprobability theory

$N$ points are selected in a uniformly distributed random way in a disk of a unit radius. Let $L(N)$ denote the expected length of the shortest polygonal path that visits each of the points at least once (the path need not to be closed any may be self-intersecting).

  • For what $N$ do we know the exact value of $L(N)$?
  • Is there a general formula for $L(N)$?
  • What is the asymptotic behavior of $L(N)$ as $N\to\infty$?
  • What are the answers to previous questions, if the disk is replaced with a ball?

Best Answer

Given: $(X,Y)$ is Uniformly distributed on a disc of unit radius. Then, the joint distribution of $n$ such points, $((X_1,Y_1), ..., (X_n,Y_n))$, each drawn independently, will have joint pdf:

$$f( (x_1,y_1), ..., (x_n,y_n) ) = \begin{cases}\pi^{-n}& (x_1^2 + y_1^2 < 1) & \text{ & } \cdots \text{ & }&(x_n^2 + y_n^2 < 1) \\ 0 & otherwise \end{cases}$$

Next, let $Z_n$ denote the Euclidean length of the shortest polygonal path that visits each point (at least) once. For any given $n$, it is possible to express $Z_n$ as an exact symbolic construct. This rapidly gets tiresome to do manually, so I have set up a function PolygonPathMinDistance[points] to automate same (I have provided the code at the bottom of this post). For example:

  • Case $n = 2$:
    With 2 points, the shortest polygonal path $Z_2$ is, of course, ... :

enter image description here

and so the desired exact solution is:

$$E[Z_2] = E\left[\sqrt{\left(X_1-X_2\right){}^2+\left(Y_1-Y_2\right){}^2}\right] \approx 0.9054$$

where the random variables $(X_1, Y_1, X_2, Y_2)$ have joint pdf $f(\cdots)$ given above.

Although getting a closed form solution for the expectation integral has thus far proved elusive (perhaps transforming to polar coordinates might be a path worth pursuing - see below), the result is just a number, and we can find that number to arbitrary precision using numerical integration rather than symbolic integration. Doing so yields the result that: $E[Z_2] = 0.9054 ...$.

  • Case $n = 3$:
    With 3 points, the shortest polygonal path $Z_3$ is:

enter image description here

and similarly obtain:

$$E[Z_3] \approx 1.49 ...$$

  • Case $n = 4$:
    With 4 points, the shortest polygonal path $Z_4$ is:

enter image description here

and similarly obtain:

$$E[Z_4] \approx 1.96 ...$$

... and so on. For cases above $n \ge 5$, the method still works fine, but numerical integration gets less reliable, and other methods, such as Monte Carlo simulation [i.e. actually generating $n$ pseudorandom points in the disc (say 100,000 times), evaluating the actual shortest path for each and every $n$-dimensional drawing, and calculating the sample mean] appear to work more reliably. The latter is, of course, no longer an exact conceptual methodology, but as a practical evaluation method, for larger $n$, it appears to become the more practical option.

Summary Plots and Fitting $$\color{red}{\text{Expected shortest polygon path connecting $n$ random points in a disc of unit radius }}$$

enter image description here

... and with $n=50, 100$ and $150$ points added:

enter image description here

The fitted model here is simply:

$$a + b \frac{n-1}{\sqrt{n}}$$

with $a = -0.11$, $b = 1.388$.

Fitting curves

A number of 'natural' contenders present themselves, including models of form:

  • $a + b n + c \sqrt{n}$ ... this performs neatly within sample, but as soon as one extends to larger values of $n$, the model performs poorly out of sample, and needs to be re-fitted given the extra data points. This suggests that the true model is not just square root $n$.

  • $a + b n^c$ ... this works very neatly within sample, and outside sample, but the optimal parameter values show some instability as more data values are added.

  • $a + b \frac{n-1}{\sqrt{n}}$ ... just 2 parameters, and works absolutely beautifully. The idea for this model was inspired by Ju'x posting. The parameter values are remarkably stable as more data points are added (they remain constant to 2 decimal places, irrespective of whether we use just $n = 2 ... 20$, or $n = 50$ is added, or $n=100$ is added, all the way up to $n = 300$), as shown here:

enter image description here

The parameter values for $a$ and $b$ are the same as for the diagram above.


Code for the PolygonPathMinDistance function

Here is some Mathematica code for the exact PolygonPathMinDistance function that calculates all of the possible relevant permutations involved in calculating the shortest polygon path, as an exact symbolic/algebraic construct:

PolygonPathMinDistance[points_] := Module[{orderings, pointorderings, pathsToCheck, ww},
       orderings =  Union[Permutations[Range[Length[points]]], 
                        SameTest -> (#1 == Reverse[#2] &)];
  pointorderings = Map[points[[#]] &, orderings];
   pathsToCheck  = Map[Partition[#, 2, 1] &, pointorderings];
 Min[Map[Total[Map[EuclideanDistance @@ # &, #1] /. Abs[ww_]->ww]&, pathsToCheck]]
]

.

Alternative specification of problem in Polar Coordinates

If points are Uniformly distributed on a disc of unit radius and independent, the joint distribution of $n$ points, $((X_1,Y_1), ..., (X_n,Y_n))$, with joint pdf $f(\cdots)$ is given above. Alternatively, consider the transformation to polar coordinates, $(X_i \to R_i cos(\Theta_i)$, $Y_i \to R_i sin(\Theta_i))$, for $i = 1$ to $n$. Then, the joint pdf is:

$$f( (r_1,\theta_1), ..., (r_n,\theta_n) ) = \frac{r_1 \times r_2 \times \cdots \times r_n}{\pi^{n}}$$

with domain of support, $R_i=r_i \in \{r_i:0<r_i<1\}$ and $\Theta_i=\theta_i \in \{\theta_i:0<\theta_i<2 \pi\}$. Then, the distance between any two random points, say $(X_i,Y_i)$ and $(X_j,Y_j)$ is given by the distance formula for polar coordinates:

$$\sqrt{\left(X_j-X_i\right){}^2+\left(Y_j-Y_i\right){}^2} = \sqrt{R_i^2 + R_j^2 -2 R_i R_j \cos \left(\Theta_i-\Theta_j\right)}$$

Taking expectations then yields the same results as above. For the $n = 2$ case, symbolic integration is a little easier, and one can obtain the result:

$$\begin{align*}\displaystyle E\left[ Z_2\right] &= E\left[ \sqrt{R_1^2 + R_2^2 -2 R_1 R_2 \cos \left(\Theta _1-\Theta _2\right)}\right] \\ &= \frac{8}{\pi} \int_0^1\int_0^1 |r_1 - r_2| EllipticE[\frac{-4 r_1 r_2}{(r_1-r_2)^2}]r_1 r_2 dr_1\,dr_2 \\ &= \frac{8}{\pi} \frac{16}{45} \approx 0.9054 \text{(as above)}\end{align*}$$

Extending from a disc to a ball

There is no inherent reason the same methods cannot be extended to a ball ... just more dimensions (and more computational grunt required).