[Math] What’s the probability distribution of the difference between two consecutive sorted random numbers generated by a known distribution

probabilityprobability distributionsprobability theorystatistics

For example, first we generate $n$ random numbers based on an exponential distribution ($f(x)=\lambda e^{-\lambda x}$), then we sort these random numbers in ascending order which are represented by $[x_1,x_2,\dots,x_n]$, where $\forall i \in [1,n-1]$, we have $x_i \leq x_{i+1}$. Now if we define $y_i = x_{i+1}-x_i$, what's the probability distribution of $\{ y_1,y_2,\dots,y_{n-1}\}$? How to derive the PDF of it?

On the other hand, if the distribution of $[x_1,x_2,\dots,x_n]$ is unknown, but we know the distribution of $\{ y_1,y_2,\dots,y_{n-1}\}$, is it possible to derive the distribution of $[x_1,x_2,\dots,x_n]$?

Best Answer

This is not a complete answer to all your questions, but it should provide some insight at the least. I'll give a shot at deriving p.d.f. for the $Y_i$ (distances between consecutive $X_i$). Once we know that, we might be able to use Bayes to find the distribution of $X_i$ given $Y_i$, but I will not address that here.

Given i.i.d. real-valued $X_i$, $i=1,2,\ldots,n$, with probability distribution $f$, one can see that they are equally likely to be in any order, and there are $n!$ possible orders. The probability of a particular ordering is $$ P(X_{k_1}<X_{k_2}<\cdots < X_{k_n}) =\int_{-\infty}^\infty f(x_{k_1}) \int_{x_{k_1}}^\infty f(x_{k_2})\cdots\int_{x_{k_{n-1}}}^\infty f(x_{k_n}) \ dx_{k_n}\cdots \ d x_{k_2}\ d x_{k_1}. $$ Switching the order of integration gives $$ \begin{aligned} P(X_{k_1}<X_{k_2}<\cdots < X_{k_n}) &=\int_{-\infty}^\infty f(x_{k_n}) \cdots \int_{-\infty}^{x_{k_3}} f(x_{k_2}) \int_{-\infty}^{x_{k_2}} f(x_{k_1}) \ dx_{k_1}\ d x_{k_2}\cdots \ d x_{k_n}\\ &=\int_{-\infty}^\infty f(x_{k_n}) \cdots \int_{-\infty}^{x_{k_3}} f(x_{k_2}) F(x_{k_2})\ d x_{k_2}\cdots \ d x_{k_n} \\ &=\int_{-\infty}^\infty f(x_{k_n}) \cdots \int_{0}^{F(x_{k_3})} u\ d u\cdots \ d x_{k_n} . \end{aligned} $$ Note that the substitution $u=F(x)$, $du=F'(x)dx$ has been made, and can be made at each stage. Continuing in the fashion will end up at $\displaystyle \frac{1}{n!}$ for the final value of the integal.

Thus $$ \begin{aligned} P(Y_i<x)&=n! P(X_1<X_2<\cdots<X_i<X_{i+1}<X_i+x|X_1<X_2<\cdots<X_n)\\ &\quad\quad\quad\quad\quad\cdot P(X_1<X_2<\cdots<X_n)\\ &=n!P(\{X_1<X_2<\cdots<X_i<X_{i+1}<X_i+x\} \cap \{X_1<X_2<\cdots<X_n\}) \end{aligned} $$

So we will calculate the probability of the intersection of events $X_1<X_2<\cdots<X_i<X_{i+1}<X_i+x$ and $X_1<X_2<\cdots<X_n$.

We wish to know $F_{Y_i}(x)=P(Y_i<x)=n!P(X_{i+1}-X_i<x)$, thus: $$ \begin{aligned} F_{Y_i}(x)= n!\int_{-\infty}^\infty &f(x_1)\int_{x_1}^\infty f(x_2)\\ &\cdots\int_{x_i}^{x_i+x}f(x_{i+1})\int_{x_{i+1}}^\infty f(x_{i+2})\\ &\quad\quad\quad\quad \cdots\int_{x_{n-1}}^\infty f(x_n) \ dx_n\cdots \ dx_1. \end{aligned} $$ This is the c.d.f. of $Y_i$. You can get the p.d.f. by taking the derivative w.r.t. $x$. This is a complicated integral, and in general, the distribution of $Y_i$ depends on $i$ and $n$. Intuitively, this makes sense as simulating a large number of i.i.d. random variables will result in consecutive values being closer.

For the $X_i$ exponentially distributed with mean $1/\lambda$, and $n=2$, $F_Y(x)=1-e^{-\lambda x}$. The calculation is $$ F_Y(x)= 2\int_0^\infty \lambda e^{-\lambda x_1} \int_{x_1}^{x_1+x} \lambda e^{-\lambda x_2} \ dx_2 \ dx_1. $$

For $n=3$, $F_{Y_2}(x)=1-e^{-\lambda x}$ and $F_{Y_1}(x)=1-e^{-2\lambda x}$. The calculations are $$ F_{Y_2}(x)= 3!\int_0^\infty \lambda e^{-\lambda x_1} \int_{x_1}^{\infty} \lambda e^{-\lambda x_2} \int_{x_2}^{x_2+x} \lambda e^{-\lambda x_3} \ dx_3 \ dx_2 \ dx_1. $$ $$ F_{Y_1}(x)= 3!\int_0^\infty \lambda e^{-\lambda x_1} \int_{x_1}^{x_1+x} \lambda e^{-\lambda x_2} \int_{x_2}^{\infty} \lambda e^{-\lambda x_3} \ dx_3 \ dx_2 \ dx_1. $$

There seems to be a pattern there which could probably be generalized for any $n$ and $i$. I haven't done the analytical calculation, but for any given $n$, $F_{Y_i}(x)=1-e^{-(n-i)\lambda x}$ appears to work. I've confirmed it numerically.

If the $X_i$ are uniformly distributed on the interval $[0,1]$, then $F_{Y_i}(x)=1-(1-x)^n$ for any $i$ and $n$. I performed the analytical calculation for $n=2,3$ and verified the formula numerically. Maybe an induction argument would work to prove it, or there may be some cleaner argument which does requires neither integration nor induction.

Best Answer

Related Solutions

[Math] Distribution of differences between adjacent sorted uniform random variables on $[0,1]$

[Math] For a random sample from the distribution $f(x)=e^{-(x-\theta)} , x>\theta$ , show that $2n[X_{(1)}-\theta]\sim\chi^2_{2}$

Related Question