Find an unbiased estimator of $\theta$ denoted by $\hat{\theta}$ such that $Var(\hat{\theta})\le Var(\theta)$

parameter estimationstatistical-inferencestatistics

Let $X_1,\dots, X_n$ be iid random samples from $Unif(\theta, 6\theta)$ with $\theta>0$. Use the Rao-Blackwell theorem to find an unbiased estimator of $\theta$ denoted by $\hat{\theta}$ such that $Var(\hat{\theta})\le Var(\theta)$.


My work: following this answer Finding UMVUE for uniform distribution $U(\alpha, \beta)$.

I know that the Complete and sufficient statistics of $\theta$ is $T(X)=(X_{(1)}, X_{(n)})$. I am a little bit confused about the answer. By,Lehmann-Scheffe theorem, we need to find a function $h$ so that
$$\mathbb{E}(h(T))=\mathbb{E}[h((X_{(1)},X_{(n)}))]=\theta$$

But how to solve such function $h$? I think that is
$$\iint h(x,y)f_{(X_{(1)},X_{(n)})}(x,y)\mathrm{d}x\mathrm{d}y=\theta$$

Also, note that
$$\mathbb{E}[X_{(1)}]=\frac{5\theta }{n+1}+\theta$$
and
$$\mathbb{E}[X_{(n)}]=\frac{5n\theta }{n+1}+\theta$$


However, the correct answer is that
$$\frac{1}{7}(X_{(1)}+X_{(n)})$$
How to get the right one?

Best Answer

UPDATED, FIXED ANSWER:

My original answer was wrong and I misread your question which, as pointed out in the comments, has a typo. Finding some estimator $\hat{\theta}(\mathbf{X})$ such that $\text{Var}(\hat{\theta}) \leq \text{Var}(\theta)$ does not make sense as $\theta \in (0, +\infty)$ is the parameter in question here and is obviously not an estimator of itself.

What your question probably meant was: find an unbiased estimator $\hat{\theta}$ which improves on the crude unbiased estimator $\hat{\theta}_0(\mathbf{X}) =2X_1/7$ of $\theta$ (since $E_\theta[2X_1/7] = \theta$) in the sense that $\text{Var}(\hat{\theta}) \leq \text{Var}(\hat{\theta}_0)$. To do this, you can use Rao-Blackwell and condition $2X_1/7$ on a sufficient statistic to get a better estimator:

The ratio $$ \frac{f_\mathbf{X}(\mathbf{u} \ | \ \theta)}{f_\mathbf{X}(\mathbf{v} \ | \ \theta)} = \frac{I(\theta < u_{(1)} < u_{(n)} < 6\theta)}{I(\theta < v_{(1)} < v_{(n)} < 6\theta)} $$ doesn't vary with $\theta$ iff $u_{(1)} = v_{(1)}$ and $u_{(n)} = v_{(n)}$. Hence $T(\mathbf{X}) = (X_{(1)}, X_{(n)})$ is minimal sufficient here.

However, as pointed out in the comments, $T$ is not complete. This is because the distribution family $\{U(\theta, 6\theta) : \theta > 0\}$ is a scale family. Hence it is well-known that $X_{(n)}/X_{(1)}$ is ancillary to the family i.e. its distribution does not depend on $\theta$. Hence $E_\theta[X_{(n)}/X_{(1)}] = c$ for some constant real $c$ that does not depend on $\theta$. Therefore $E_\theta[(X_{(n)}/X_{(1)}) - c] = 0$ for all $\theta > 0$.

As a result you cannot use the Lehmann-Scheffe theorem on $T$ here to get a UMVUE. Fortunately, we don't need a UMVUE; we just need a $\hat{\theta}$ that merely improves on $\hat{\theta}_0 = 2X_1/7$. By Rao-Blackwell, one such $\hat{\theta}$ is $$ \hat{\theta}(\mathbf{X}) = E_\theta[\hat{\theta}_0(\mathbf{X}) \ | \ T(\mathbf{X})] = \frac{2}{7}E_\theta[X_1 \ | \ (X_{(1)}, X_{(n)})] $$ (which does not depend on $\theta$ as $T$ is sufficient)

Finding $E[X_1 \ | \ T]$ has to be done manually. One common strategy for this is:

  1. Find $E[X_{(i)} \ | \ T]$ for $i = 2, \ldots, n - 1$
  2. Thus calculate $E[\sum_{i = 1}^n X_{(i)} \ | \ T] = \sum_{i = 1}^n E[X_{(i)} \ | \ T]$
  3. And finally notice that since sums can be done in any order, the sum of order statistics is the same as the sum of the sample: $\sum_{i = 1}^n X_{(i)} = \sum_{i = 1}^n X_{i}$; So $$ E[\sum_{i = 1}^n X_{(i)} \ | \ T] = E[\sum_{i = 1}^n X_{i} \ | \ T] = \sum_{i = 1}^n E[ X_{i} \ | \ T] = n E[X_1 \ | \ T] $$

Part 1. For any $i = 2, \ldots, n - 1$ the joint pdf of the three order statistics $(X_{(1)}, X_{(i)}, X_{(n)})$ is $$ f_{(X_{(1)}, X_{(i)}, X_{(n)})}(x_{(1)}, x_{(i)}, x_{(n)}) = \frac{n!}{(i - 2)!(n - i - 1)!}\frac{1}{(5\theta)^3}\Big(\frac{x_{(i)} - x_{(1)}}{5\theta}\Big)^{i - 2}\Big(\frac{x_{(n)} - x_{(i)}}{5\theta}\Big)^{n - i - 1} $$ on $\theta < x_1 < x_i < x_n < 6\theta$ and $0$ elsewhere.

You can argue for this as follows. Suppose the $1$st, $i$th and $n$th order statistics are observed as $X_{(1)} = x_{(1)}, X_{(i)} = x_{(i)}, X_{(n)} = x_{(n)}$. This means our i.i.d. sample values $X_1, \ldots, X_n \sim X = U(\theta, 6\theta)$ are partitioned into $5$ classes:

  • $1$ sample value must be $x_{(1)}$
  • $1$ sample value must be $x_{(i)}$
  • $1$ sample value must be $x_{(n)}$
  • $i - 2$ sample values must each fall in the interval $(x_{(1)}, x_{(i)})$
  • the remaining $n - i - 1$ sample values must each fall in the interval $(x_{(i)}, x_{(n)})$

The likelihood of this happening, using i.i.d. property and the multinomial formula for $5$ classes, is $\frac{n!}{1!1!1!(i - 2)!(n - i - 1)!}f_{X}(x_{(1)})f_{X}(x_{(i)})f_{X}(x_{(n)})P[x_{(1)} < X < x_{(i)}]^{i - 2}P[x_{(i)} < X < x_{(n)}]^{n - i - 1}$, which expands out to the above.

Also $f_{(X_{(1)}, X_{(n)})}(x_{(1)}, x_{(n)}) = \frac{n!}{(n - 2)!} \frac{1}{(5\theta)^2}(\frac{x_{(n)} - x_{(1)}}{5\theta})^{n - 2}$ on $\theta < x_{(1)} < x_{(n)} < 6\theta$ and $0$ elsewhere. Thus the pdf of the conditional variable $X_{(i)} \ | \ x_{(1)}, x_{(n)}$ is their ratio: $$ f_{X_{(i)}}(x_{(i)} \ | \ x_{(1)}, x_{(n)}) = \frac{(n - 2)!}{(i - 2)!(n - i - 1)!}\frac{(x_{(i)} - x_{(1)})^{i - 2} (x_{(n)} - x_{(i)})^{n - i - 1} }{(x_{(n)} - x_{(1)})^{n - 2}} $$ for $x_{(1)} < x_{(i)} < x_{(n)}$ and $0$ elsewhere. This may seem daunting but it is actually an easy transformation of a recognizable pdf. Indeed scaling and translating, it can be seen that $$ \frac{\big[X_{(i)} \ | \ x_{(1)}, x_{(n)}\big] - x_{(1)}}{x_{(n)} - x_{(1)}} \sim \text{Beta}(i - 1, n - i) $$ whose expectation is well-known to be $(i - 1)/(n - 1)$ and thus $$ E[X_{(i)} \ | \ (X_{(1)}, X_{(n)})] = (X_{(n)} - X_{(1)})\frac{i - 1}{n - 1} + X_{(1)} $$ for $i = 2, \ldots, n - 1$

Part 2. As a result \begin{align*} \sum_{i = 1}^n E[X_{(i)} \ | \ (X_{(1)}, X_{(n)})] &= X_{(1)} + \sum_{i = 2}^{n - 1}\Big[(X_{(n)} - X_{(1)})\frac{i - 1}{n - 1} + X_{(1)}\Big] + X_{(n)} \\ &= X_{(1)} + (X_{(n)} - X_{(1)})\frac{(n - 2)}{2} + (n - 2)X_{(1)} + X_{(n)} \\ &= \frac{n}{2}(X_{(1)} + X_{(n)}) \end{align*}

Part 3. Thus $$ E[X_1 \ | \ (X_{(1)}, X_{(n)})] = \frac{1}{n}\sum_{i = 1}^n E[X_{(i)} \ | \ (X_{(1)}, X_{(n)})] = \frac{1}{2}(X_{(1)} + X_{(n)}) $$ which means that our required estimator is $$ \hat{\theta}(\mathbf{X}) = \frac{2}{7} E[X_1 \ | \ (X_{(1)}, X_{(n)})] = \frac{1}{7}(X_{(1)} + X_{(n)}) $$

Related Question