Order Statistics Distribution – Understanding Distribution of Sum of Order Statistics

mathematical-statisticsmaximum likelihoodorder-statisticsself-studysufficient-statistics

The question is from a problem I am trying to solve in Robert Hogg's introduction to Mathematical Statistics 6th version problem 7.2.9 in page 380.

The problem is:

We consider a random sample $X_1, X_2,\ldots ,X_n$ from a distribution
with pdf $f(x;\theta)=(1/\theta$)exp($-x/\theta$), $0<x<\infty$.
Possibly, in a life testing situation, however, we only observe the
first r order statistics $Y_1<Y_2<\cdots <Y_r$.

(a) Record the joint pdf of these order statistics and denote it by
$L(\theta)$

(b) Under these conditions, find the mle, $\hat{\theta}$, by
maximizing $L(\theta)$.

(c)Find the mgf and pdf of $\hat{\theta}$.

(d) With a slight extension of the definition of sufficiency, is
$\hat{\theta}$ a sufficient statistic?

I can solve (a) and (b) but I am completely stuck by (c) therefore cannot forward to (d)

Solve (a):

We know joint pdf for $Y_1,Y_2,\ldots,Y_n$ is $g(y_1,y_2,\ldots,y_n)=n!f(y_1)f(y_2)\cdots f(y_n)$ we just integrate out the (r+1) to n terms we will get joint pfd for $Y_1,Y_2,\ldots,Y_r$.

$h(y_1,y_2,\ldots,y_r)=n!f(y_1)f(y_2)\cdots f(y_r)\int_{n-1}^{\infty} \int_{n-2}^{\infty}\cdots \int_{r+1}^{\infty} \int_{r}^{\infty}f(y_{r+1})f(y_{r+2})\cdots f(y_{n-1})f(y_n)dy_{r+1}dy_{r+2}\cdots dy_{n-1}dy_{n}$

$=n!f(y_1)f(y_2)\cdots f(y_r) \int_{n-2}^{\infty}\cdots \int_{r+1}^{\infty} \int_{r}^{\infty}f(y_{r+1})f(y_{r+2})\cdots f(y_{n-1})[1-F(y_{n-1})]dy_{r+1}dy_{r+2}\cdots dy_{n-1}$

$=n!f(y_1)f(y_2)\cdots f(y_r)\int_{n-2}^{\infty}\cdots \int_{r+1}^{\infty} \int_{r}^{\infty}f(y_{r+1})f(y_{r+2})\cdots f(y_{n-2})(-1)[1-F(y_{n-1})]dy_{r+1}dy_{r+2}\cdots dy_{n-2}]d[1-F(y_{n-1})]$

$=n!f(y_1)f(y_2)\cdots f(y_r)\int_{n-2}^{\infty}\cdots \int_{r+1}^{\infty} \int_{r}^{\infty}f(y_{r+1})f(y_{r+2})\cdots f(y_{n-2})\frac{[1-F(y_{n-2})]^2}{2}dy_{r+1}dy_{r+2}\cdots dy_{n-2}$

$=n!f(y_1)f(y_2)\cdots f(y_r)\frac{[1-F(y_r)]^{n-r}}{(n-r)!}$

$=n!\frac{1}{\theta}e^{\frac{-y_1}{\theta}}\frac{1}{\theta}e^{\frac{-y_2}{\theta}}\cdots \frac{1}{\theta}e^{\frac{-y_r}{\theta}}[e^{-y_r/\theta}]^{n-r}/(n-r)!$

$=\frac{n!\theta^{-r}}{(n-r)!}e^{-\frac{1}{\theta}[\sum_{i=1}^{r}y_i+(n-r)y_r]}$

(b)

This part is not difficult. It just a normal way to calculate mle.

$\log L(\theta;y)=\log \frac{n!}{(n-2)!}-r\log(\theta)-\frac{1}{\theta}[\sum_{i=1}^{r}y_i+(n-r)y_r]$
Take derivative of the log likelihood function we get:
$\partial \frac{L(\theta;y)}{\theta}=\frac{1}{\theta^2}[\sum_{i=1}^{r}y_i+(n-r)y_r]-r\frac{1}{\theta}$

Set the derivative to zero

We get: $\hat{\theta}=\frac{[\sum_{i=1}^{r}y_i+(n-r)y_r]}{r}$

(c)

To solve (c) I think we need at least to know the distribution of $\sum_{i=1}^{r}y_i$.

I search the internet, there is a paper talk about this distribution, https://www.ocf.berkeley.edu/~wwu/articles/orderStatSum.pdf

But I think the method might not be correct since for order statistic $F(y_i)$ are different, we cannot use binomial distribution there.

There is another paper here http://www.jstor.org/stable/4615746?seq=1#page_scan_tab_contents

But I am totally lost at formula (2.2) if someone would like to explain the paper with more detailed calculations, it will be highly appreciated.

(d) only after solve (c)

Best Answer

Since$$(y_1,\ldots,y_r)\sim\frac{n!\theta^{-r}}{(n-r)!}e^{-\frac{1}{\theta}[\sum_{i=1}^{r}y_i+(n-r)y_r]}\mathbb{I}_{y_\le y_2\le \ldots \le y_r}$$you have the joint pdf of $(y_1,\ldots,y_r)$. From there, you can deduce the pdf of $$s_r=\sum_{i=1}^{r}y_i+(n-r)y_r\,.$$Indeed, because the Jacobian of the transform is constant,\begin{align*}f_s(y_1,\ldots,y_{r-1},s_r) &\propto f_Y\left(y_1,\ldots,\left\{s_r-\sum_{i=1}^{r-1}y_i\right\}\Big/(n-r+1)\right) \\&\propto \theta^{-r} \exp\{-s_r/\theta\}\mathbb{I}_{y_\le y_2\le \ldots \le\left\{s_r-\sum_{i=1}^{r-1}y_i\right\}/(n-r+1)}\end{align*}implies by integration in $y_1,\ldots,y_{r-1}$ that$$f_s(s_r)\propto\theta^{-r} \exp\{-s_r/\theta\}s_r^{r-1}$$ Indeed, \begin{align*} f_s(s_r)&=\int\cdots\int f_s(y_1,\ldots,y_{r-1},s_r)\text{d}y_1\cdots\text{d}y_{r-1}\\ &= \theta^{-r} \exp\{-s_r/\theta\}\int\cdots\int \mathbb{I}_{y_\le y_2\le \ldots \le\left\{s_r-\sum_{i=1}^{r-1}y_i\right\}/(n-r+1)}\text{d}y_1\cdots\text{d}y_{r-1} \end{align*} leads to constraint $y_{r-1}$ by $y_{r-2}\le y_{r-1}$ and by $$y_{r-1}\le \left\{s_r-\sum_{i=1}^{r-1}y_i\right\}/(n-r+1)=\left\{s_r-\sum_{i=1}^{r-2}y_i\right\}/(n-r+1)-\frac{y_{r-1}}{n-r+1}$$ which simplifies into $$y_{r-1}\le \left\{s_r-\sum_{i=1}^{r-2}y_i\right\}/(n-r+2)$$ If one starts integrating in $y_{r-1}$, the most inner integral is \begin{align*}\int_{y_{r-2}}^{\{s_r-\sum_{i=1}^{r-2}y_i\}/(n-r+2)}\text{d}y_{r-1}&=\left\{s_r-\sum_{i=1}^{r-2}y_i\right\}/(n-r+2)-y_{r-2}\\ &=\left\{s_r-\sum_{i=1}^{r-3}y_i\right\}/(n-r+2)-\frac{(n-r+1)y_{r-2}}{n-r+2} \end{align*} and from there one can proceed by recursion.

Hence$$s_r\sim\mathcal{G}a(r,1/\theta)$$

Here is an R simulation to show the fit: enter image description here obtained as follows

n=10
r=5    
sim=matrix(rexp(n*1e4),1e4,n)
sim=t(apply(sim,1,sort))
res=apply(sim[,1:r],1,sum)+(n-r)*sim[,5]
hist(res,prob=TRUE)
curve(dgamma(x,sh=(n-r),sc=1),add=TRUE)
Related Question