[Math] What’s the probability distribution of the difference between two consecutive sorted random numbers generated by a known distribution

probabilityprobability distributionsprobability theorystatistics

For example, first we generate $n$ random numbers based on an exponential distribution ($f(x)=\lambda e^{-\lambda x}$), then we sort these random numbers in ascending order which are represented by $[x_1,x_2,\dots,x_n]$, where $\forall i \in [1,n-1]$, we have $x_i \leq x_{i+1}$. Now if we define $y_i = x_{i+1}-x_i$, what's the probability distribution of $\{ y_1,y_2,\dots,y_{n-1}\}$? How to derive the PDF of it?

On the other hand, if the distribution of $[x_1,x_2,\dots,x_n]$ is unknown, but we know the distribution of $\{ y_1,y_2,\dots,y_{n-1}\}$, is it possible to derive the distribution of $[x_1,x_2,\dots,x_n]$?

Best Answer

This is not a complete answer to all your questions, but it should provide some insight at the least. I'll give a shot at deriving p.d.f. for the $Y_i$ (distances between consecutive $X_i$). Once we know that, we might be able to use Bayes to find the distribution of $X_i$ given $Y_i$, but I will not address that here.

Given i.i.d. real-valued $X_i$, $i=1,2,\ldots,n$, with probability distribution $f$, one can see that they are equally likely to be in any order, and there are $n!$ possible orders. The probability of a particular ordering is $$ P(X_{k_1}<X_{k_2}<\cdots < X_{k_n}) =\int_{-\infty}^\infty f(x_{k_1}) \int_{x_{k_1}}^\infty f(x_{k_2})\cdots\int_{x_{k_{n-1}}}^\infty f(x_{k_n}) \ dx_{k_n}\cdots \ d x_{k_2}\ d x_{k_1}. $$ Switching the order of integration gives $$ \begin{aligned} P(X_{k_1}<X_{k_2}<\cdots < X_{k_n}) &=\int_{-\infty}^\infty f(x_{k_n}) \cdots \int_{-\infty}^{x_{k_3}} f(x_{k_2}) \int_{-\infty}^{x_{k_2}} f(x_{k_1}) \ dx_{k_1}\ d x_{k_2}\cdots \ d x_{k_n}\\ &=\int_{-\infty}^\infty f(x_{k_n}) \cdots \int_{-\infty}^{x_{k_3}} f(x_{k_2}) F(x_{k_2})\ d x_{k_2}\cdots \ d x_{k_n} \\ &=\int_{-\infty}^\infty f(x_{k_n}) \cdots \int_{0}^{F(x_{k_3})} u\ d u\cdots \ d x_{k_n} . \end{aligned} $$ Note that the substitution $u=F(x)$, $du=F'(x)dx$ has been made, and can be made at each stage. Continuing in the fashion will end up at $\displaystyle \frac{1}{n!}$ for the final value of the integal.

Thus $$ \begin{aligned} P(Y_i<x)&=n! P(X_1<X_2<\cdots<X_i<X_{i+1}<X_i+x|X_1<X_2<\cdots<X_n)\\ &\quad\quad\quad\quad\quad\cdot P(X_1<X_2<\cdots<X_n)\\ &=n!P(\{X_1<X_2<\cdots<X_i<X_{i+1}<X_i+x\} \cap \{X_1<X_2<\cdots<X_n\}) \end{aligned} $$

So we will calculate the probability of the intersection of events $X_1<X_2<\cdots<X_i<X_{i+1}<X_i+x$ and $X_1<X_2<\cdots<X_n$.

We wish to know $F_{Y_i}(x)=P(Y_i<x)=n!P(X_{i+1}-X_i<x)$, thus: $$ \begin{aligned} F_{Y_i}(x)= n!\int_{-\infty}^\infty &f(x_1)\int_{x_1}^\infty f(x_2)\\ &\cdots\int_{x_i}^{x_i+x}f(x_{i+1})\int_{x_{i+1}}^\infty f(x_{i+2})\\ &\quad\quad\quad\quad \cdots\int_{x_{n-1}}^\infty f(x_n) \ dx_n\cdots \ dx_1. \end{aligned} $$ This is the c.d.f. of $Y_i$. You can get the p.d.f. by taking the derivative w.r.t. $x$. This is a complicated integral, and in general, the distribution of $Y_i$ depends on $i$ and $n$. Intuitively, this makes sense as simulating a large number of i.i.d. random variables will result in consecutive values being closer.

For the $X_i$ exponentially distributed with mean $1/\lambda$, and $n=2$, $F_Y(x)=1-e^{-\lambda x}$. The calculation is $$ F_Y(x)= 2\int_0^\infty \lambda e^{-\lambda x_1} \int_{x_1}^{x_1+x} \lambda e^{-\lambda x_2} \ dx_2 \ dx_1. $$

For $n=3$, $F_{Y_2}(x)=1-e^{-\lambda x}$ and $F_{Y_1}(x)=1-e^{-2\lambda x}$. The calculations are $$ F_{Y_2}(x)= 3!\int_0^\infty \lambda e^{-\lambda x_1} \int_{x_1}^{\infty} \lambda e^{-\lambda x_2} \int_{x_2}^{x_2+x} \lambda e^{-\lambda x_3} \ dx_3 \ dx_2 \ dx_1. $$ $$ F_{Y_1}(x)= 3!\int_0^\infty \lambda e^{-\lambda x_1} \int_{x_1}^{x_1+x} \lambda e^{-\lambda x_2} \int_{x_2}^{\infty} \lambda e^{-\lambda x_3} \ dx_3 \ dx_2 \ dx_1. $$

There seems to be a pattern there which could probably be generalized for any $n$ and $i$. I haven't done the analytical calculation, but for any given $n$, $F_{Y_i}(x)=1-e^{-(n-i)\lambda x}$ appears to work. I've confirmed it numerically.

If the $X_i$ are uniformly distributed on the interval $[0,1]$, then $F_{Y_i}(x)=1-(1-x)^n$ for any $i$ and $n$. I performed the analytical calculation for $n=2,3$ and verified the formula numerically. Maybe an induction argument would work to prove it, or there may be some cleaner argument which does requires neither integration nor induction.