Least Square Adjustment for normally distributed data is a MLE

least squaresmaximum likelihoodparameter estimationstatistics

If we apply a least square adjustment (lsa) with BLUE (best linear unbiased estimator) algorithm, we minimize the sum of squared residuals. This obviously represents a MVUE (minimum variance unbiased estimator). Wikipedia provides details on BLUE/Gauss–Markov theorem.

Assume the observed data (usually called y) is normally distributed (which in general has not to be the case for BLUE/lsa). Is it correct to conclude that the best linear unbiased estimator/lsa is then a maximum likelihood estimator (MLE)?

Best Answer

If $Y_1,\ldots,Y_n\sim \text{i.i.d.} \operatorname N(\mu,\sigma^2)$ then the sample mean $(Y_1+\cdots+Y_n)/n$ is both the least-squares estimator of $\mu$ and the maximum-likelihood estimator of $\mu.$

It is also the best linear unbiased estimator of $\mu,$ i.e.

  • it is a linear combination of $Y_1,\ldots,Y_n,$ and
  • it is unbiased in the sense that its expected value remains equal to $\mu$ if $\mu$ changes, and
  • it is best in the sense that it has a smaller variance than does any other estimator satisfying the two conditions above.
  • It is also better than all other unbiased estimators of $\mu.$ For example, the sample median is an unbiased estimator of $\mu$ that is not a linear combination of $Y_1,\ldots,Y_n,$ and it has a larger variance than that of the sample mean. The fact that it is better than all other unbiased estimators is at the same depth as the one-to-one nature of the two-sided Laplace transform.

The same thing applies to more elaborate sorts of linear models. For example, suppose we have $$ \text{independent } Y_i \sim \operatorname N(a+bx_i, \sigma^2) \text{ for } i=1,\ldots,n. $$ Then the least-squares estimators of $a$ and $b$ are likewise BLUE.

In the situations above, least-squares estimation of $\mu$ or $(a,b)$ coincides with maximum-likelihood estimation.

The proofs of the assertions in the bulleted list above, except for the fourth bullet point, can be done with far less information than that the $Y\text{s}$ have the distributions above. It is enough to assume that

  • $Y_1,\ldots,Y_n$ all have expected value $\mu,$ or that they have respective expected values $a+bx_i,$ and
  • $Y_1,\ldots,Y_n$ all have the same variance (not necessarily the same distribution), and
  • $Y_1,\ldots, Y_n$ are uncorrelated (not necessarily independent).

The Gauss–Markov theorem says that these three assumptions are enough to guarantee that least-squares is BLUE.

But with these weaker Gauss–Markov assumptions, it makes no sense to speak of maximum likelihood, since we don't have a parametrized family of probability distributions.

Related Question