Probability – Maximum Gap Between Samples in Discrete Uniform Distribution

combinatoricsmathematical-statisticsorder-statisticsprobabilityuniform distribution

This problem is related to my lab's research in robotic coverage:

Randomly draw $n$ numbers from the set $\{1,2,\ldots,m\}$ without replacement, and sort the numbers in ascending order. $1\le n\le m$.

From this sorted list of numbers $\{a_{(1)},a_{(2)},…,a_{(n)}\}$, generate the difference between consecutive numbers and the boundaries: $g = \{a_{(1)},a_{(2)}−a_{(1)},\ldots,a_{(n)}−a_{(n-1)},m+1-a_{(n)}\}$. This gives $n+1$ gaps.

What is the distribution of the maximum gap?

$P(\max(g) = k) = P(k;m,n) = ? $

This can be framed using order statistics:
$P(g_{(n+1)} = k) = P(k;m,n) = ? $

See link for the distribution of gaps, but this question asks the distribution of the maximum gap.

I'd be satisfied with the average value, $\mathbb{E}[g_{(n+1)}]$.

If $n=m$ all the gaps are size 1.
If $n+1 = m$ there is one gap of size $2$, and $n+1$ possible locations.
The maximum gap size is $m-n+1$, and this gap can be placed before or after any of the $n$ numbers, for a total of $n+1$ possible positions.
The smallest maximum gap size is $\lceil\frac{m-n}{n+1}\rceil$.
Define the probability of any given combination $T= {m \choose n}^{-1}$.

I've partially solved the probability mass function as
$
P(g_{(n+1)} = k) = P(k;m,n) =
\begin{cases}
0 & k < \lceil\frac{m-n}{n+1}\rceil\\
1 & k = \frac{m-n}{n+1} \\
1 & k = 1 \text{ (occurs when $m=n$)} \\
T(n+1)& k = 2 \text{ (occurs when $m=n+1$)} \\
T(n+1)& k = \frac{m-(n-1)}{n} \\
? & \frac{m-(n-1)}{n} \le k \le m-n+1 \\
T(n+1)& k = m-n+1\\
0 & k > m-n+1
\end{cases} \tag{1}
$

Current work (1):
The equation for the first gap, $a_{(1)}$ is straightforward:
$$
P(a_{(1)} = k) = P(k;m,n) = \frac{1}{{m \choose n}}
\sum_{k=1}^{m-n+1} {m-k-1 \choose n-1}
$$

The expected value has a simple value:
$\mathbb{E}[P(a_{(1)})] = \frac{1}{
{m \choose n}}
\sum_{k=1}^{m-n+1} {m-k-1 \choose n-1} k = \frac{m-n}{1+n}$
.
By symmetry, I expect all $n$ gaps to have this distribution. Perhaps the solution could be found by drawing from this distribution $n$ times.

Current work (2): it is easy to run Monte Carlo simulations.

simMaxGap[m_, n_] := Max[Differences[Sort[Join[RandomSample[Range[m], n], {0, m+1}]]]];
m = 1000; n = 1; trials = 100000;
SmoothHistogram[Table[simMaxGap[m, n], {trials}], Filling -> Axis,
Frame -> {True, True, False, False},
FrameLabel -> {"k (Max gap)", "Probability"},
PlotLabel -> StringForm["m=``,n=``,smooth histogram of maximum map for `` trials", m, n, trials]]

enter image description here

Best Answer

Let $f(g;n,m)$ be the chance that the minimum, $a_{(1)}$, equals $g$; that is, the sample consists of $g$ and an $n-1$-subset of $\{g+1,g+2,\ldots,m\}$. There are $\binom{m-g}{n-1}$ such subsets out of the $\binom{m}{n}$ equally likely subsets, whence

$$\Pr(a_{(1)}=g = f(g;n,m) = \frac{\binom{m-g}{n-1}}{\binom{m}{n}}.$$

Adding $f(k;n,m)$ for all possible values of $k$ greater than $g$ yields the survival function

$$\Pr(a_{(1)} \gt g) = Q(g;n,m)= \frac{(m-g)\binom{m-g-1}{n-1}}{n \binom{m}{n}}.$$

Let $G_{n,m}$ be the random variable given by the largest gap:

$$G_{n,m} = \max\left(a_{(1)}, a_{(2)}-a_{(1)}, \ldots, a_{(n)}-a_{(n-1)}\right).$$

(This responds to the question as originally framed, before it was modified to include a gap between $a_{(n)}$ and $m$.) We will compute its survival function $$P(g;n,m)=\Pr(G_{n,m}\gt g),$$ from which the entire distribution of $G_{n,m}$ is readily derived. The method is a dynamic program beginning with $n=1$, for which it is obvious that

$$P(g;1,m) = \Pr(G_{1,m} \gt 1) = \frac{m-g}{m},\ g=0, 1, \ldots, m.\tag{1}$$

For larger $n\gt 1$, note that the event $G_{n,m}\gt g$ is the disjoint union of the event

$$a_{1} \gt g,$$

for which the very first gap exceeds $g$, and the $g$ separate events

$$a_{1}=k\text{ and } G_{n-1,m-k} \gt g, \ k=1, 2, \ldots, g$$

for which the first gap equals $k$ and a gap greater than $g$ occurs later in the sample. The Law of Total Probability asserts the probabilities of these events add, whence

$$P(g;n,m) = Q(g;n,m) + \sum_{k=1}^g f(k;n,m) P(g;n-1,m-k).\tag{2}$$

Fixing $g$ and laying out a two-way array indexed by $i=1,2,\ldots,n$ and $j=1,2,\ldots,m$, we may compute $P(g;n,m)$ by using $(1)$ to fill in its first row and $(2)$ to fill in each successive row using $O(gm)$ operations per row. Consequently the table can be completed in $O(gmn)$ operations and all tables for $g=1$ through $g=m-n+1$ can be constructed in $O(m^3n)$ operations.

Figure

These graphs show the survival function $g\to P(g;n,64)$ for $n=1,2,4,8,16,32,64$. As $n$ increases, the graph moves to the left, corresponding to the decreasing chances of large gaps.

Closed formulas for $P(g;n,m)$ can be obtained in many special cases, especially for large $n$, but I have not been able to obtain a closed formula that applies to all $g,n,m$. Good approximations are readily available by replacing this problem with the analogous problem for continuous uniform variables.

Finally, the expectation of $G_{n,m}$ is obtained by summing its survival function starting at $g=0$:

$$\mathbb{E}(G_{n,m}) = \sum_{g=0}^{m-n+1} P(g;n,m).$$

Figure 2: contour plot of expectation

This contour plot of the expectation shows contours at $2, 4, 6, \ldots, 32$, graduating from dark to light.