For a beta distribution with parameters $a$ and $b$, we can interpret it as the distribution of the probability of heads for a coin we tossed $a+b$ times and saw $a$ heads and $b$ tails. At the same time, if we draw $n$ uniform random numbers and sort them, the $k$-th order statistic is also Beta distributed with parameters $a=k$ and $b=n+1-k$. So, its like we tossed $n+1$ coins and got $k$ heads. Is there an intuitive explanation for this? I can see the derivations mechanically but any logical reason the two distributions should be the same?
Why are the order statistics of uniform Beta
order-statisticsprobabilityuniform distribution
Related Solutions
In the first case, the "prior" distribution being updated is $\text{Uniform}(0,1)=\text{Beta}(1,1)$ for the location of a single point chosen uniformly at random in $(0,1)$. But in the second case, the "missing $+1$" arises from the "prior" now for the distance between two such points chosen independently, which is $\text{Beta}(1,2)$ (see footnote$^\dagger$).
First case. Given a single uniform r.v. $U$, $n-1$ further i.i.d. uniforms $X_1,...,X_{n-1}$ are placed, forming $n-1$ Bernoulli trials, success meaning that $X_i<U,$ and $Y$ is the number of successes: $$\begin{align}&U_{(k)}\sim(U\mid Y=k-1)\\[1ex] &\quad\quad\quad\quad \text{ where }\begin{cases}U\sim\text{Uniform}(0,1)=\text{Beta}(1,1)\\ (Y\mid U=u)\sim\text{Binomial}(n-1,u)\end{cases}\\[3ex] &\implies U_{(k)}\sim\text{Beta}(1+(k-1),1+(n-1)-(k-1))\\ &\quad\quad\quad\quad\quad\quad=\text{Beta}(k,n-k+1)\end{align}$$
Second case. Given two i.i.d. uniforms $U,U'$, $n-2$ further i.i.d. uniforms $X_1,...,X_{n-2}$ are placed, forming $n-2$ Bernoulli trials, success meaning that $\min(U,U')\le X_i\le \max(U,U'),$ and $Y$ is the number of successes: $$\begin{align}&U_{(k)}-U_{(j)}\sim(|U-U'|\,\mid Y=k-j-1)\\[1ex] &\quad\quad\quad\quad\quad\quad\quad \text{ where }\begin{cases}\text{i.i.d. }U,U'\sim\text{Uniform}(0,1),\\ \quad \implies |U-U'|\sim\text{Beta}(1,2)\\ (Y\mid |U-U'|=d)\sim\text{Binomial}(n-2,d)\end{cases}\\[3ex] &\implies U_{(k)}-U_{(j)}\sim\text{Beta}(1+(k-j-1),2+(n-2)-(k-j-1))\\ &\quad\quad\quad\quad\quad\quad\quad\quad=\text{Beta}(k-j,n-k+j+1)\end{align}$$
In both cases, the conclusion is a consequence of the following fact relating the (conditional) beta and binomial distributions:
$$\left. \begin{array}{l} X\sim\text{Beta}(a,b)\\ (Y\mid X=x)\sim\text{Binomial}(n,x) \end{array} \right\} \implies (X\mid Y=y)\sim\text{Beta}(a+y,b+n-y)$$
$^\dagger$ Note that $\text{Beta}(1,2)$ is a right-triangular distribution with p.d.f. $2(1-t)$ and c.d.f. $1-(1-t)^2$ for $0<t<1.$ That this is the distribution of $|U-U'|$ for $\text{i.i.d. } U,U'\sim\text{Uniform}(0,1)$ has an easy geometrical proof.
Best Answer
You answered your own question completely and entirely, using the observation by lee-david-chung-lin. Actually a Beta(a,b) distribution for $a,b \in \{1,2,\dots\}$ is the same as the Bayesian posterior for $U$ when $U$ is uniform, a priori, and we observe that the outcomes of a Binomial(U,n) random variable is $a-1$ for $n=(a-1)+(b-1)=a+b-2$. That is just because the density of a Beta(a,b) random variable is $x^{a-1}(1-x)^{b-1} \mathbf{1}_{[0,1]}(x)/B(a,b)$ where $B(a,b)$ is the Beta integral. But the probability for a binomial(n,p) random variable to equal $k$ is $\binom{n}{k} p^k (1-p)^{n-k}$. So changing $p$ to $x$, using Bayes's rule, and matching $a-1$ to $k$ and $b-1$ to $n-k$ gives the result. That is what was pointed out by lee-david-chung-lin. So then the $a$ for the $k$th order statistic is $a=k-1+1$ following your explanation Rohit Pandey and $b=n-k+1$. Those match the desired connection.
There is a trivial discrete analog. Imagine $L$ boxes in a linearly-ordered row, with $n$ balls in the boxes, such that there are no more than 1 ball in each of the boxes. The probability for the $k$th ball (reading left-to-right) to be in box $x$ is $\binom{x-1}{k-1} \binom{L-x}{n-k}/\binom{L}{n}$ for each possible choice of $x$. (Note that, summing over $x$, this generalizes the ``hockeystick identity'' for binomial coefficients in a trivial way.) This is not quite the same: here the primary interpretation is in terms of the $k$th order-statistics.
The binomial(n,p) distribution is replaced by a capture-recapture hypergeometric distribution, which is not as straightforward as the binomial. Imagine a population of $L-1$ elephants and you capture and tag a uniformly distribution random number $\mathsf{Y}$ elephants, and set $x=y+1$ if $\mathsf{Y}=y$. (This hypothesis is admittedly a bit odd in this context.) Then you randomly collect a new sample of $n-1$ of the elephant population. You discover $k-1$ of your new sample were tagged and $n-k$ were not. What is your updated guess for $y$?
This example is sometimes interesting in classes because it gives an explicitly calculable example for Bayes's rule that is purely discrete, setting it up just as you did Rohit Pandey for the continuous case. This could be a good example if one teaches basic probability to students who are concurrently learning multivariable calculus. Then you do not have to delay the Bayes' rule examples to the 2nd half of the semester. (For example, following Durrett's Elementary Probability for Applications [which is available free on his website now] but getting to this combinatorial Bayes' rule example before doing the multivariable calculus stuff in the later chapter on continuous distributions.)