[Math] Mean squared error for vectors

estimationestimation-theorymean square errorparameter estimationstatistics

I know that when we compare estimators $\hat{b_1}$ and $\hat{b_2}$ to an unknown parameter $\beta$, in classical statistics an estimator $\hat{b_1}$ is said to be "better" than $\hat{b_2}$ if:

$$ MSE(\hat{b_1}) \leq MSE(\hat{b_2}) $$ where MSE is the mean squared error: $$ MSE(\hat{b_1}) = E((\hat{b_1}-\beta)^2 )$$

Now if I had a vector $ \boldsymbol{b} =(b_1,b_2,\ldots b_n)$ of parameters to estimate, how could I compare estimators in terms of the MSE? Because there is no unique ordering relation in vectors.

I know some people compare component by component of both estimators, yet I seem to find no bibliography for that. Could you guys help me figure out a bibliography for that?

Best Answer

Note that if $\hat \theta(X)$ is an estimator (depending on random data $X$) for the parameter $\theta\in \mathbb{R}^n,$ the MSE is a scalar quantity defined as

$$\begin{align}MSE(\hat\theta,\theta)&\equiv E[\|\hat\theta(X)-\theta\|^2]\\ &=E[(\hat\theta(X)-\theta)'(\hat\theta(X)-\theta)].\\\end{align}$$

With some matrix algebra, one can easily prove the identity

$$\begin{align}MSE(\hat\theta,\theta)&=\|Bias(\hat\theta,\theta)\|^2+tr(Var(\hat\theta(X))),\\ Bias(\hat\theta,\theta)&\equiv E[\hat\theta(X)]-\theta. \end{align}$$

So rather than look at a vector of individual MSEs, we typically look at the above metric as the generalization of MSE.


However, the MSE is only one metric to judge an estimator by. One may also be interested in looking at the variance-covariance matrix $Var(\hat\theta(X))$, in which case your question still stands, namely how do we decide which of $V_1\equiv Var(\hat\theta_1(X))$, $V_2\equiv Var(\hat\theta_2(X))$ is "greater" given two estimators $\hat\theta_1(X),\hat\theta_2(X)$?

A common partial order used in this respect that is defined on the set of symmetric positive semidefinite matrices is the Loewner order: $$V_1\geq V_2\iff V_1-V_2 \text{ is positive semidefinite (p.s.d)}.$$

Being a partial order, this relation cannot be used to compare any two variance-covariance matrices summoned from the ether, but it is still meaningful. For instance, because p.s.d matrices have nonnegative diagonal entries, one immediate implication of $V_1\geq V_2$ is that the variance of each component of $\hat\theta_1(X)$ is at least as great as the variance of the corresponding component of $\hat\theta_2(X).$