[Math] Proof of Gauss-Markov theorem

linear regressionregression

Theorem: Let $Y=X\beta+\varepsilon$ where $$Y\in\mathcal M_{n\times 1}(\mathbb R),$$ $$X\in \mathcal M_{n\times p}(\mathbb R),$$ $$\beta\in\mathcal M_{n\times 1}(\mathbb R ),$$ and $$\varepsilon\in\mathcal M_{n\times 1}(\mathbb R ).$$

We suppose that $X$ has full rank $p$ and that $$\mathbb E[\varepsilon]=0\quad\text{and}\quad \text{Var}(\varepsilon)=\sigma ^2I.$$
Then, the least square estimator (i.e. $\hat\beta=(X^TX)^{-1}X^Ty$) is the best unbiased estimator of $\beta$, that is for any linear unbiased estimator $\tilde\beta$ of $\beta$, it hold that $$\text{Var}(\tilde\beta)-\text{Var}(\hat\beta)\geq 0.$$

Proof

Let $\tilde\beta$ a linear unbiased estimator, i.e.
$$\tilde\beta=AY\ \ \text{for some }A_{n\times p}\quad\text{and}\quad\mathbb E[\tilde\beta]=\beta\text{ for all }\beta\in\mathbb R ^p.$$

Questions :

1) Why $\mathbb E[\tilde\beta]=\beta$ for all $\beta$, I don't really understand this point. To me $\beta$ is fixed, so $\mathbb E[\tilde\beta]=\beta$ for all $\beta$ doesn't have really sense.

2) Actually, what is the difference between the least square estimator and the maximum likelihood estimator. They both are $\hat\beta=(X^TX)^{-1}X^Ty$, so I don't really see (if they are the same), why we give two different name.

Best Answer

The Gauss-Markov Theorem is actually telling us that in a regression model, where the expected value of our error terms is zero, $E(\epsilon_{i}) = 0$ and variance of the error terms is constant and finite $\sigma^{2}(\epsilon_{i}) = \sigma^{2} < \infty$ and $\epsilon_{i}$ and $\epsilon_{j}$ are uncorrelated for all i and j the least squares estimator $b_{0}$ and $b_{1}$ are unbiased and have minimum variance among all unbiased linear estimators. Note that there might be biased estimator which have a even lower variance.

Extensive information about the Gauss-Markov Theorem, such as the mathematical proof of the Gauss-Markov Theorem can be found here http://economictheoryblog.com/2015/02/26/markov_theorem/

However, if you want to know which assumption is necessary for $b1$ to be an unbiased estimator for $\beta1$, I guess that assumption 1 to 4 of the following post (http://economictheoryblog.com/2015/04/01/ols_assumptions/) must be fulfilled to have an unbiased estimator.

Furthermore, it is true that the maximum likelihood estimator and least squares estimator are equivalent under certain conditions, i.e if noise $\epsilon$ is Gaussian distributed.

Hope this helps.

HTH

Related Question