Solved – On the existence of UMVUE and choice of estimator of $\theta$ in $\mathcal N(\theta,\theta^2)$ population

estimationinferencemathematical-statisticsnormal distributionumvue

Let $(X_1,X_2,\cdots,X_n)$ be a random sample drawn from $\mathcal N(\theta,\theta^2)$ population where $\theta\in\mathbb R$.

I am looking for the UMVUE of $\theta$.

Joint density of $(X_1,X_2,\cdots,X_n)$ is

\begin{align}
f_{\theta}(x_1,x_2,\cdots,x_n)&=\prod_{i=1}^n\frac{1}{\theta\sqrt{2\pi}}\exp\left[-\frac{1}{2\theta^2}(x_i-\theta)^2\right]
\\&=\frac{1}{(\theta\sqrt{2\pi})^n}\exp\left[-\frac{1}{2\theta^2}\sum_{i=1}^n(x_i-\theta)^2\right]
\\&=\frac{1}{(\theta\sqrt{2\pi})^n}\exp\left[\frac{1}{\theta}\sum_{i=1}^n x_i-\frac{1}{2\theta^2}\sum_{i=1}^nx_i^2-\frac{n}{2}\right]
\\&=g(\theta,T(\mathbf x))h(\mathbf x)\qquad\forall\,(x_1,\cdots,x_n)\in\mathbb R^n\,,\forall\,\theta\in\mathbb R
\end{align}

,where $g(\theta, T(\mathbf x))=\frac{1}{(\theta\sqrt{2\pi})^n}\exp\left[\frac{1}{\theta}\sum_{i=1}^n x_i-\frac{1}{2\theta^2}\sum_{i=1}^nx_i^2-\frac{n}{2}\right]$ and $h(\mathbf x)=1$.

Here, $g$ depends on $\theta$ and on $x_1,\cdots,x_n$ through $T(\mathbf x)=\left(\sum_{i=1}^nx_i,\sum_{i=1}^nx_i^2\right)$ and $h$ is independent of $\theta$. So by Fisher-Neyman factorisation theorem, the two-dimensional statistic $T(\mathbf X)=\left(\sum_{i=1}^nX_i,\sum_{i=1}^nX_i^2\right)$ is sufficient for $\theta$.

However, $T$ is not a complete statistic. This is because $$E_{\theta}\left[2\left(\sum_{i=1}^n X_i\right)^2-(n+1)\sum_{i=1}^nX_i^2\right]=2n(1+n)\theta^2-(n+1)2n\theta^2=0\qquad\forall\,\theta$$

and the function $g^*(T(\mathbf X))=2\left(\sum_{i=1}^n X_i\right)^2-(n+1)\sum_{i=1}^nX_i^2$ is not identically zero.

But I do know that $T$ is a minimal sufficient statistic.

I am not certain but I think a complete statistic may not exist for this curved exponential family. So how then should I get the UMVUE? If a complete statistic does not exist, can an unbiased estimator (like $\bar X$ in this case) which is a function of minimal sufficient statistic be the UMVUE?
(Related thread: What is the necessary condition for a unbiased estimator to be UMVUE?)

What if I consider the Best linear unbiased estimator (BLUE) of $\theta$? Can the BLUE be the UMVUE?

Suppose I consider the linear unbiased estimator $T^*(\mathbf X)=a\bar X+(1-a)cS$ of $\theta$ where $c(n)=\sqrt{\frac{n-1}{2}}\frac{\Gamma\left(\frac{n-1}{2}\right)}{\Gamma\left(\frac{n}{2}\right)}$ and $S^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar X)^2$. Since we do know that $E_{\theta}(cS)=\theta$.
My idea is to minimize $\text{Var}(T^*)$ so that $T^*$ would be the BLUE of $\theta$. Would $T^*$ be then the UMVUE of $\theta$?

I have taken a linear unbiased estimator based on $\bar X$ and $S$ as $(\bar X,S^2)$ is also sufficient for $\theta$.

Edit:

A lot of work has indeed been done in the estimation of $\theta$ in the more general $\mathcal N(\theta,a\theta^2)$ family where $a>0$ is known. The following are some of the most relevant references:

Estimating the mean of a normal distribution with known coefficient of variation by Gleser/Healy.
A note on estimating the mean of a normal distribution with known coefficient of variation by R.A. Khan.
A Remark on Estimating The Mean of A Normal
Distribution with Known Coefficient of Variation by R.A. Khan.
This chapter extract.

I found the first of these references in this exercise from Statistical Inference by Casella/Berger:

My question is not about this exercise though.

The final note (the chapter extract) says that the UMVUE of $\theta$ does not exist as the minimal sufficient statistic is not complete. I would like to know what enables us to conclude that a UMVUE does not exist simply because a complete sufficient statistic cannot be found? Is there any related result regarding this? I see existence of UMVUE even when complete sufficient statistics do not exist in the linked thread.

Now assuming that a uniformly minimum variance unbiased estimator does not exist, what should be our next criteria for choosing the 'best' estimator? Do we look for the minimum MSE, the minimum variance or the MLE? Or would the choice of criteria depend on our purpose of estimation?

For example, say I have an unbiased estimator $T_1$ and another biased estimator $T_2$ of $\theta$. Suppose the MSE of $T_1$ (which is its variance) is more than that of $T_2$. Since minimisation of MSE means minimising the bias as well as the variance simultaneously, $T_2$ I think should be a 'better' choice of estimator than $T_1$ though the former is biased.

Probable choices of estimators of $\theta$ are listed from page 4 of the last note.

The following extract is from Theory of Point Estimation by Lehmann/Casella (second edition, page 87-88):

It is highly probable that I have misunderstood everything, but is the last sentence saying that under certain conditions, existence of complete statistic is necessary for existence of UMVUE? If so, is this the result I should be looking at?

That last result due to R.R. Bahadur which is mentioned right at the end refers to this note.

Upon further searching, I have found a result stating that if the minimal sufficient statistic is not complete, then a complete statistic does not exist. So at least I am pretty much convinced that a complete statistic does not exist here.

Another result I forgot to consider is the one that roughly says that a necessary and sufficient condition for an unbiased estimator to be the UMVUE is that it must be uncorrelated with every unbiased estimator of zero. I tried using this theorem to show that a UMVUE does not exist here, and also the fact that an unbiased estimator like $\bar X$ is not the UMVUE. But this does not work out as simple as done, for example here, in the final illustration.

Best Answer

Update:

Consider the estimator $$\hat 0 = \bar{X} - cS$$ where $c$ is given in your post. This is is an unbiased estimator of $0$ and will clearly be correlated with the estimator given below (for any value of $a$).

Theorem 6.2.25 from C&B shows how to find complete sufficient statistics for the Exponential family so long as $$\{(w_1(\theta), \cdots w_k(\theta)\}$$ contains an open set in $\mathbb R^k$. Unfortunately this distribution yields $w_1(\theta) = \theta^{-2}$ and $w_2(\theta) = \theta^{-1}$ which does NOT form an open set in $R^2$ (since $w_1(\theta) = w_2(\theta)^2$). It is because of this that the statistic $(\bar{X}, S^2)$ is not complete for $\theta$, and it is for the same reason that we can construct an unbiased estimator of $0$ that will be correlated with any unbiased estimator of $\theta$ that is based on the sufficient statistics.

Another Update:

From here, the argument is constructive. It must be the case that there exists another unbiased estimator $\tilde\theta$ such that $Var(\tilde\theta) < Var(\hat\theta)$ for at least one $\theta \in \Theta$.

Proof: Let suppose that $E(\hat\theta) = \theta$, $E(\hat 0) = 0$ and $Cov(\hat\theta, \hat 0) < 0$ (for some value of $\theta$). Consider a new estimator $$\tilde\theta = \hat\theta + b\hat0$$ This estimator is clearly unbiased with variance $$Var(\tilde\theta) = Var(\hat\theta) + b^2Var(\hat0) + 2bCov(\hat\theta,\hat0)$$ Let $M(\theta) = \frac{-2Cov(\hat\theta, \hat0)}{Var(\hat0)}$.

By assumption, there must exist a $\theta_0$ such that $M(\theta_0) > 0$. If we choose $b \in (0, M(\theta_0))$, then $Var(\tilde\theta) < Var(\hat\theta)$ at $\theta_0$. Therefore $\hat\theta$ cannot be the UMVUE. $\quad \square$

In summary: The fact that $\hat\theta$ is correlated with $\hat0$ (for any choice of $a$) implies that we can construct a new estimator which is better than $\hat\theta$ for at least one point $\theta_0$, violating the uniformity of $\hat\theta$ claim for best unbiasedness.

Let's look at your idea of linear combinations more closely.

$$\hat\theta = a \bar X + (1-a)cS$$

As you point out, $\hat\theta$ is a reasonable estimator since it is based on Sufficient (albeit not complete) statistics. Clearly, this estimator is unbiased, so to compute the MSE we need only compute the variance.

\begin{align*} MSE(\hat\theta) &= a^2 Var(\bar{X}) + (1-a)^2 c^2 Var(S) \\ &= \frac{a^2\theta^2}{n} + (1-a)^2 c^2 \left[E(S^2) - E(S)^2\right] \\ &= \frac{a^2\theta^2}{n} + (1-a)^2 c^2 \left[\theta^2 - \theta^2/c^2\right] \\ &= \theta^2\left[\frac{a^2}{n} + (1-a)^2(c^2 - 1)\right] \end{align*}

By differentiating, we can find the "optimal $a$" for a given sample size $n$.

$$a_{opt}(n) = \frac{c^2 - 1}{1/n + c^2 - 1}$$ where $$c^2 = \frac{n-1}{2}\left(\frac{\Gamma((n-1)/2)}{\Gamma(n/2)}\right)^2$$

A plot of this optimal choice of $a$ is given below.

It is somewhat interesting to note that as $n\rightarrow \infty$, we have $a_{opt}\rightarrow \frac{1}{3}$ (confirmed via Wolframalpha).

While there is no guarantee that this is the UMVUE, this estimator is the minimum variance estimator of all unbiased linear combinations of the sufficient statistics.

Best Answer

Update:

Another Update:

Related Solutions

Solved – UMVUE for $\theta$ in Weibull distribution

Estimation – Finding UMVUE of $\frac{\theta}{1+\theta}$ from $\text{Beta}(\theta,1)$ Population

Related Question