Solved – Is OLS Asymptotically Efficient Under Heteroscedasticity

efficiencyheteroscedasticityleast squares

I know that OLS is unbiased but not efficient under heteroscedasticity in a linear regression setting.

In Wikipedia

http://en.wikipedia.org/wiki/Minimum_mean_square_error

The MMSE estimator is asymptotically unbiased and it converges in distribution to the normal distribution:
$\sqrt{n}(\hat{x} – x) \xrightarrow{d} \mathcal{N}\left(0 , I^{-1}(x)\right)$,
where I(x) is the Fisher information of x. Thus, the MMSE estimator is asymptotically efficient.

MMSE is claimed to be asymptotically efficient. I am a little confused here.

Does this mean OLS is not efficient in finite sample, but efficient asymptotically under heteroscedasticity?

Critique of the current answers:
So far the proposed answers don't address the limiting distribution.

Thanks in advance

Best Answer

The article never assumed homoskadasticity in the definition. To put it in the context of the article, homoskedasticity would be saying $$ E\{(\hat x-x)(\hat x-x)^T\}=\sigma I $$ Where $I$ is the $n\times n$ identity matrix and $\sigma$ is a scalar positive number. Heteroscadasticity allows for

$$ E\{(\hat x-x)(\hat x-x)^T\}=D $$

Any $D$ diaganol positive definite. The article defines the covariance matrix in the most general way possible, as the centered second moment of some implicit multi-variate distribution. we must know the multivariate distribution of $e$ to obtain an asymptotically efficient and consistent estimate of $\hat x$. This will come from a likelihood function (which is a mandatory component of the posterior). For example, assume $e \sim N(0,\Sigma)$ (i.e $E\{(\hat x-x)(\hat x-x)^T\}=\Sigma$. Then the implied likelihood function is $$ \log[L]=\log[\phi(\hat x -x, \Sigma)] $$ Where $\phi$ is the multivariate normal pdf.

The fisher information matrix may be written as $$ I(x)=E\bigg[\bigg(\frac{\partial}{\partial x}\log[L]\bigg)^2 \,\bigg|\,x \bigg] $$ see en.wikipedia.org/wiki/Fisher_information for more. It is from here that we can derive $$ \sqrt{n}(\hat x -x) \rightarrow^d N(0, I^{-1}(x)) $$ The above is using a quadratic loss function but does not assuming homoscedasticity.

In the context of OLS, where we regress $y$ on $x$ we assume $$ E\{y|x\}=x'\beta $$ The likelihood implied is $$ \log[L]=\log[\phi(y-x'\beta, \sigma I)] $$ Which may be conveniently rewritten as $$ \log[L]=\sum_{i=1}^n\log[\varphi(y-x'\beta, \sigma)] $$ $\varphi$ the univariate normal pdf. The fisher information is then $$ I(\beta)=[\sigma (xx')^{-1}]^{-1} $$

If homoskedasticity is not meet, then the Fisher information as stated is miss specified (but the conditional expectation function is still correct) so the estimates of $\beta$ will be consistent but inefficient. We could rewrite the likelihood to account for heteroskacticity and the regression is efficient i.e. we can write $$ \log[L]=\log[\phi(y-x'\beta, D)] $$ This is equivalent to certain forms of Generalized Least Squares, such as Weighted least squares. However, this will change the Fisher information matrix. In practice we often don't know the form of heteroscedasticity so we sometimes prefer accept the inefficiency rather than chance biasing the regression by miss specifying weighting schemes. In such cases the asymptotic covariance of $\beta$ is not $\frac{1}{n}I^{-1}(\beta)$ as specified above.