Extreme Value – Distribution of the Largest Fragment of a Broken Stick (Spacings)

dirichlet distributiondistributionsextreme valueorder-statisticsuniform distribution

Let a stick of length 1 be broken in $k+1$ fragments uniformly at random. What is the distribution of the length of the longest fragment?

More formally, let $(U_1, \ldots U_k)$ be IID $U(0,1)$, and let $(U_{(1)}, \ldots, U_{(k)})$ be the associated order statistics, i.e. we simply order the sample in such a way that $U_{(1)} \leq U_{(2)} \leq, \ldots , \leq U_{(k)}$. Let $Z_k = \max \left(U_{(1)}, U_{(2)}-U_{(1)}, \ldots, U_{(k)} – U_{(k-1)}, 1-U_{(k)}\right)$.

I am interested in the distribution of $Z_k$. Moments, asymptotic results, or approximations for $k \uparrow \infty$ are also interesting.

Best Answer

With the information given by @Glen_b I could find the answer. Using the same notations as the question

$$ P(Z_k \leq x) = \sum_{j=0}^{k+1} { k+1 \choose j } (-1)^j (1-jx)_+^k, $$

where $a_+ = a$ if $a > 0$ and $0$ otherwise. I also give the expectation and the asymptotic convergence to the Gumbel (NB: not Beta) distribution

$$ E(Z_k)= \frac{1}{k+1}\sum_{i=1}^{k+1}\frac{1}{i} \sim \frac{\log(k+1)}{k+1}, \\ P(Z_k \leq x) \sim \exp\left(- e^{-(k+1)x + \log(k+1)} \right). $$

The material of the proofs is taken from several publications linked in the references. They are somewhat lengthy, but straightforward.

1. Proof of the exact distribution

Let $(U_1, \ldots, U_k)$ be IID uniform random variables in the interval $(0,1)$. By ordering them, we obtain the $k$ order statistics denoted $(U_{(1)}, \ldots, U_{(k)})$. The uniform spacings are defined as $\Delta_i = U_{(i)} - U_{(i-1)}$, with $U_{(0)} = 0$ and $U_{(k+1)} = 1$. The ordered spacings are the corresponding ordered statistics $\Delta_{(1)} \leq \ldots \leq \Delta_{(k+1)}$. The variable of interest is $\Delta_{(k+1)}$.

For fixed $x \in (0,1)$, we define the indicator variable $\mathbb{1}_i = \mathbb{1}_{\{\Delta_i > x\}}$. By symmetry, the random vector $(\mathbb{1}_1, \ldots, \mathbb{1}_{k+1})$ is exchangeable, so the joint distribution of a subset of size $j$ is the same as the joint distribution of the first $j$. By expanding the product, we thus obtain

$$ P(\Delta_{(k+1)} \leq x) = E \left( \prod_{i=1}^{k+1} (1 - \mathbb{1}_i) \right) = 1 + \sum_{j=1}^{k+1} { k+1 \choose j } (-1)^j E \left( \prod_{i=1}^j \mathbb{1}_i \right). $$

We will now prove that $E \left( \prod_{i=1}^j \mathbb{1}_i \right) = (1-jx)_+^k$, which will establish the distribution given above. We prove this for $j=2$, as the general case is proved similarly.

$$ E \left( \prod_{i=1}^2 \mathbb{1}_i \right) = P(\Delta_1 > x \cap \Delta_2 > x) = P(\Delta_1 > x) P(\Delta_2 > x | \Delta_1 > x). $$

If $\Delta_1 > x$, the $k$ breakpoints are in the interval $(x,1)$. Conditionally on this event, the breakpoints are still exchangeable, so the probability that the distance between the second and the first breakpoint is greater than $x$ is the same as the probability that the distance between the first breakpoint and the left barrier (at position $x$) is greater than $x$. So

$$ P(\Delta_2 > x | \Delta_1 > x) = P\big(\text{all points are in } (2x,1) \big| \text{all points are in } (x,1)\big), \; \text{so} \\ P(\Delta_2 > x \cap \Delta_1 > x) = P\big(\text{all points are in } (2x,1)\big) = (1-2x)_+^k. $$

2. Expectation

For distributions with finite support, we have

$$ E(X) = \int P(X > x)dx = 1 - \int P(X \leq x)dx. $$

Integrating the distribution of $\Delta_{(k+1)}$, we obtain

$$ E\left(\Delta_{(k+1)}\right) = \frac{1}{k+1}\sum_{j=1}^{k+1}{k+1 \choose j}\frac{(-1)^{j+1}}{j} = \frac{1}{k+1}\sum_{j=1}^{k+1}\frac{1}{j}. $$

The last equality is a classic representation of harmonic numbers $H_i = 1+ \frac{1}{2}+ \ldots + \frac{1}{i}$, which we demonstrate below.

$$ H_{k+1} = \int_0^1 1 + x + \ldots + x^k dx = \int_0^1 \frac{1-x^{k+1}}{1-x}dx. $$

With the change of variable $u = 1-x$ and expanding the product, we obtain

$$ H_{k+1} = \int_0^1\sum_{j=1}^{k+1}{ k+1 \choose j }(-1)^{j+1}u^{j-1}du = \sum_{j=1}^{k+1}{k+1 \choose j}\frac{(-1)^{j+1}}{j}. $$

3. Alternative construction of uniform spacings

In order to obtain the asymptotic distribution of the largest fragment, we will need to exhibit a classical construction of uniform spacings as exponential variables divided by their sum. The probability density of the associated order statistics $(U_{(1)}, \ldots, U_{(k)})$ is

$$ f_{U_{(1)}, \ldots U_{(k)}}(u_{(1)}, \ldots, u_{(k)}) = k!, \; 0 \leq u_{(1)} \leq \ldots \leq u_{(k+1)}. $$

If we denote the uniform spacings $\Delta_i = U_{(i)} - U_{(i-1)}$, with $U_{(0)} = 0$, we obtain

$$ f_{\Delta_1, \ldots \Delta_k}(\delta_1, \ldots, \delta_k) = k!, \; 0 \leq \delta_i + \ldots + \delta_k \leq 1. $$

By defining $U_{(k+1)} = 1$, we thus obtain

$$ f_{\Delta_1, \ldots \Delta_{k+1}}(\delta_1, \ldots, \delta_{k+1}) = k!, \; \delta_1 + \ldots + \delta_k = 1. $$

Now, let $(X_1, \ldots, X_{k+1})$ be IID exponential random variables with mean 1, and let $S = X_1 + \ldots + X_{k+1}$. With a simple change of variable, we can see that

$$f_{X_1, \ldots X_k, S}(x_1, \ldots, x_k, s) = e^{-s}.$$

Define $Y_i = X_i/S$, such that by a change of variable we obtain

$$f_{Y_1, \ldots Y_k, S}(y_1, \ldots, y_k, s) = s^k e^{-s}.$$

Integrating this density with respect to $s$, we thus obtain

$$ f_{Y_1, \ldots Y_k,}(y_1, \ldots, y_k) = \int_0^{\infty}s^k e^{-s}ds = k!, \; 0 \leq y_i + \ldots + y_k \leq 1, \; \text{and thus} \\ f_{Y_1, \ldots Y_{k+1},}(y_1, \ldots, y_{k+1}) = k!, \; y_1 + \ldots + y_{k+1} = 1. $$

So the joint distribution of $k+1$ uniform spacings on the interval $(0,1)$ is the same as the joint distribution of $k+1$ exponential random variables divided by their sum. We come to the following equivalence of distribution

$$ \Delta_{(k+1)} \equiv \frac{X_{(k+1)}}{X_1 + \ldots + X_{k+1}}. $$

4. Asymptotic distribution

Using the equivalence above, we obtain

$$ \begin{align} P\big((k+1)\Delta_{(k+1)} - \log(k+1) \leq x\big) &= P\left(X_{(k+1)} \leq (x + \log(k+1))\frac{X_1 + \ldots + X_{k+1}}{k+1}\right) \\ &= P\left(X_{(k+1)} - \log(k+1) \leq x + (x + \log(k+1))T_{k+1}\right), \end{align} $$

where $T_{k+1} = \frac{X_1+\ldots+X_{k+1}}{k+1} -1$. This variable vanishes in probability because $E\left(T_{k+1}\right) = 0$ and $Var\big(\log(k+1)T_{k+1}\big) = \frac{(\log(k+1))^2}{k+1} \downarrow 0$. Asymptotically, the distribution is the same as that of $X_{(k+1)} - \log(k+1)$. Because the $X_i$ are IID, we have

$$ \begin{align} P\left(X_{(k+1)} - \log(k+1) \leq x \right) &= P\left(X_1 \leq x + \log(k+1)\right)^{k+1} \\ &= \left(1-e^{-x - \log(k+1)}\right)^{k+1} = \left(1-\frac{e^{-x}}{k+1}\right)^{k+1} \sim \exp\left\{-e^{-x}\right\}. \end{align} $$

5. Graphical overview

The plot below shows the distribution of the largest fragment for different values of $k$. For $k=10, 20, 50$, I have also overlaid the asymptotic Gumbel distribution (thin line). The Gumbel is a very bad approximation for small values of $k$ so I omit them to not overload the picture. The Gumbel approximation is good from $k \approx 50$.

Distribution of the largest fragment of a broken stick

6. References

The proofs above are taken from references 2 and 3. The cited literature contains many more results, such as the distribution of the ordered spacings of any rank, their limit distribution and some alternative constructions of the ordered uniform spacings. The key references are not easily accessible, so I also provide links to the full text.

  1. Bairamov et al. (2010) Limit results for ordered uniform spacings, Stat papers, 51:1, pp 227-240
  2. Holst (1980) On the lengths of the pieces of a stick broken at random, J. Appl. Prob., 17, pp 623-634
  3. Pyke (1965) Spacings, JRSS(B) 27:3, pp. 395-449
  4. Renyi (1953) On the theory of order statistics, Acta math Hung, 4, pp 191-231
Related Question