Regression – OLS is BLUE: What if Unbiasedness and Linearity Don’t Matter?

regressionunbiased-estimator

The Gauss-Markov theorem tells us that the OLS estimator is the best linear unbiased estimator for the linear regression model.

But suppose I don't care about linearity and unbiasedness. Then is there some other (possible nonlinear/biased) estimator for the linear regression model which is the most efficient under the Gauss-Markov assumptions or some other general set of assumptions?

There is of course one standard result: OLS itself is the best unbiased estimator if in addition to the Gauss-Markov assumptions we also assume that the errors are normally distributed. For some other particular distribution of errors I could compute the corresponding maximum-likelihood estimator.

But I was wondering if there is some estimator which is better-than-OLS in some relatively general set of circumstances?

Best Answer

Unbiased estimates are typical in introductory statistics courses because they are: 1) classic, 2) easy to analyze mathematically. The Cramer-Rao lower bound is one of the main tools for 2). Away from unbiased estimates there is possible improvement. The bias-variance trade off is an important concept in statistics for understanding how biased estimates can be better than unbiased estimates.

Unfortunately, biased estimators are typically harder to analyze. In regression, much of the research in the past 40 years has been about biased estimation. This began with ridge regression (Hoerl and Kennard, 1970). See Frank and Friedman (1996) and Burr and Fry (2005) for some review and insights.

The bias-variance tradeoff becomes more important in high-dimensions, where the number of variables is large. Charles Stein surprised everyone when he proved that in the Normal means problem the sample mean is no longer admissible if $p \geq 3$ (see Stein, 1956). The James-Stein estimator (James and Stein 1961) was the first example of an estimator that dominates the sample mean. However, it is also inadmissible.

An important part of the bias-variance problem is determining how bias should be traded off. There is no single “best” estimator. Sparsity has been an important part of research in the past decade. See Hesterberg et al. (2008) for a partial review.

Most of the estimators referenced above are non-linear in $Y$. Even ridge regression is non-linear once the data is used to determine the ridge parameter.