Solved – On the proof of admissibility of constant estimators under squared loss

The question concerns the discussion in Wasserman, All of Statistics, Section 13.6. He defines:

An estimator $\hat{\theta}$ is inadmissible if there exists another
rule $\hat{\theta}'$ such that $$R(\theta, \hat{\theta}') \leq R(\theta, \hat{\theta}) \ \text{for all} \ \theta$$ and $$R(\theta,\hat{\theta}') < \, R(\theta, \hat{\theta}) \ \ \text{for at least one} \ \ \theta.$$ Otherwise, $\hat{\theta}$ is
admissible.

He then provides the following example:

Let $Y \sim N(\theta,1)$ and estimate $\theta$ with squared error
loss. Let $\hat{\theta}(Y) = 3$. We will show that $\hat{\theta}$ is
admissible. Suppose not. Then there exists a different rule
$\hat{\theta}'$ with smaller risk. In particular, \begin{align*} R(3, \hat{\theta}') \leq& \, R(3, \hat{\theta}) = 0. \end{align*}
Hence, \begin{align*} 0 = R(3, \hat{\theta}') =& \, \int\left(\hat{\theta}'(y) – 3 \right)^2 f(y| 3) \, \mathrm{d}y. \end{align*} Thus $\hat{\theta}'(y) = 3$. So there is no rule that
beats $\hat{\theta}$. Even though $\hat{\theta}$ is admissible it is
clearly a bad decision rule.

What the example seems to tell me is that any potential alternative rule must have zero squared loss at 3 if it is to render $\hat\theta$ inadmissible: if that were not the case, the first requirement for inadmissibility of $\hat\theta$ would not be met.

I am however not sure why it is not possible in this example that $\hat{\theta}'$ would have strictly smaller risk at some other $\theta$ than 3. In particular, how (strongly) is this related to normality and squared loss as used in the example?

Is the argument simply that the normal density is strictly positive everywhere and the loss function strictly positive except at $\theta$, so that if $\hat{\theta}'$ were any other rule than 3, that the risk would not be 0 at 3? That is, if $\hat{\theta}'$ were some other constant estimator, it would not be unbiased for 3, and if it actually used the data, it would have nonzero variance, so that it would not estimate 3 without error, leading to a nonzero MSE(=risk under squared loss) for $\theta=3$?

Would, therefore, any other density with strictly positive density and any other loss function that is strictly positive except at $\theta$ also work to demonstrate admissibility of constant estimators?

Note: See here for a similar question that however does not entirely address my uncertainties.

Best Answer

Starting with the intuition, for some other estimator to dominate a constant estimator, for example $\hat{\theta}=3$, it has to be as good or better over the entire parameter space including when the true value of the parameter is actually 3. It's going to be a (hopeless) challenge to beat the constant estimator when it happens to match the true value.

Going to his proof, Wasserman shows that $\hat{\theta}'$ must have at most 0 risk at $\theta=3$ for the constant estimator to be inadmissible. He then shows that an estimator with zero risk at 3 must be the same as the constant estimator $\hat{\theta}=3$!

So he shows that there is no non-constant estimator that satisfies the condition $$R(3,\hat{\theta'}) \leq R(3,\hat{\theta}) = 0$$ So we don't need to consider the parts of parameter space when $\hat{\theta'}$ is strictly better than $\hat{\theta} =3$ since we don't satisfy the first criterion.

Note this doesn't seem to depend on normality at all. The density in the integral is totally general and doesn't need to be a normal density for the conclusions to hold.

The point of this example is that admissibility alone doesn't make an estimator good. In fact, a really stupid estimator like a constant can still be admissible.

Best Answer

Related Solutions

Solved – How does an estimator that minimizes a weighted sum of squared bias and variance fit into decision theory

Related Question