Maximum Likelihood – Understanding Why Stein’s Paradox Applies in Dimensions $\ge 3$

Stein's Example shows that the maximum likelihood estimate of $n$ normally distributed variables with means $\mu_1,\ldots,\mu_n$ and variances $1$ is inadmissible (under a square loss function) iff $n\ge 3$. For a neat proof, see the first chapter of Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction by Bradley Effron.

This was highly surprising to me at first, but there is some intuition behind why one might expect the standard estimate to be inadmissible (most notably, if $x \sim \mathcal N(\mu,1)$, then $\mathbb{E}\|x\|^2\approx \|\mu\|^2+n$, as outlined in Stein's original paper, linked to below).

My question is rather: What property of $n$-dimensional space (for $n\ge 3$) does $\mathbb{R}^2$ lack which facilitates Stein's example? Possible answers could be about the curvature of the $n$-sphere, or something completely different.

In other words, why is the MLE admissible in $\mathbb{R}^2$?

Edit 1: In response to @mpiktas concern about 1.31 following from 1.30:

$$E_\mu\left(\|z-\hat{\mu}\|^2\right)=E_\mu\left(S\left(\frac{N-2}{S}\right)^2\right)=E_\mu\left(\frac{(N-2)^2}{S}\right).$$

$$\hat{\mu_i} = \left(1-\frac{N-2}{S}\right)z_i$$ so $$E_\mu\left(\frac{\partial\hat{\mu_i}}{\partial z_i} \right)=E_\mu\left( 1-\frac{N-2}{S}+2\frac{z_i^2}{S^2}\right).$$ Therefore we have:

$$2\sum_{i=1}^N E_\mu\left(\frac{\partial\hat{\mu_i}}{\partial z_i} \right)=2N-2E_\mu\left(\frac{N(N-2)}{S}\right)+4E_\mu\left(\frac{(N-2)}{S}\right)\\=2N-E_\mu\frac{2(N-2)^2}{S}.$$

Edit 2: In this paper, Stein proves that the MLE is admissible for $N=2$.

Best Answer

The dichotomy between the cases $d < 3$ and $d \geq 3$ for the admissibility of the MLE of the mean of a $d$-dimensional multivariate normal random variable is certainly shocking.

There is another very famous example in probability and statistics in which there is a dichotomy between the $d < 3$ and $d \geq 3$ cases. This is the recurrence of a simple random walk on the lattice $\mathbb{Z}^d$. That is, the $d$-dimensional simple random walk is recurrent in 1 or 2 dimensions, but is transient in $d \geq 3$ dimensions. The continuous-time analogue (in the form of Brownian motion) also holds.

It turns out that the two are closely related.

Larry Brown proved that the two questions are essentially equivalent. That is, the best invariant estimator $\hat{\mu} \equiv \hat{\mu}(X) = X$ of a $d$-dimensional multivariate normal mean vector is admissible if and only if the $d$-dimensional Brownian motion is recurrent.

In fact, his results go much further. For any sensible (i.e., generalized Bayes) estimator $\tilde{\mu} \equiv \tilde{\mu}(X)$ with bounded (generalized) $L_2$ risk, there is an explicit(!) corresponding $d$-dimensional diffusion such that the estimator $\tilde{\mu}$ is admissible if and only if its corresponding diffusion is recurrent.

The local mean of this diffusion is essentially the discrepancy between the two estimators, i.e., $\tilde{\mu} - \hat{\mu}$ and the covariance of the diffusion is $2 I$. From this, it is easy to see that for the case of the MLE $\tilde{\mu} = \hat{\mu} = X$, we recover (rescaled) Brownian motion.

So, in some sense, we can view the question of admissibility through the lens of stochastic processes and use well-studied properties of diffusions to arrive at the desired conclusions.

References

L. Brown (1971). Admissible estimators, recurrent diffusions, and insoluble boundary value problems. Ann. Math. Stat., vol. 42, no. 3, pp. 855–903.
R. N. Bhattacharya (1978). Criteria for recurrence and existence of invariant measures for multidimensional diffusions. Ann. Prob., vol. 6, no. 4, 541–553.

Best Answer

Related Solutions

Solved – MLE for joint distribution

Solved – Critical region of likelihood ratio test

Related Question