Why Generalized Least Squares

generalized-least-squareslinearregressionweighted-regression

So it is often advise to use Generalized Least Squares when we have a regression model with non-spherical(i.e. heteroskedastic or autocorrelated) errors. We do so by doing a weighted regression
$$
(y-x\hat\beta)^TW(y-x\hat\beta)
$$

with $W = \Sigma^{-1} = Cov(\epsilon)^{-1}$, the inverse of the covariance matrix of errors.

The variance of the estimated $\hat\beta$ is
$$
\begin{aligned}
Var(\hat\beta_{GLS})&=(X^TWX)^{-1}X^TW\Sigma W^TX(X^TW^TX)^{-1}\\
&=(X^TWX)^{-1}=(X^T\Sigma^{-1} W)^{-1}
\end{aligned}
$$

To do GLS, we must know $\Sigma$. But if we already know $\Sigma$, why can't we just do regular OLS, and calculate $Var(\hat\beta)$ as
$$
Var(\hat\beta)=(X^TX)^{-1}X^T\Sigma X(X^TX)
$$

? Is it because $Var(\hat\beta_{GLS})$ is smaller?

Another question I've always had is that for OLS, $\beta$ is estimated as :
$$
\hat\beta_{OLS}=(X^TX)^{-1}X^Ty
$$

. For GLS or WLS, the $\hat\beta$ is estimated as
$$
\hat\beta_{GLS} = (X^TWX)^{-1}X^TWy
$$

, which is unbiased for non-spherical error. Yet, we are told that $\hat\beta_{OLS}$ is also an unbiased estimator of $\beta$, even with non-spherical errors. Does that mean
$(X^TWX)^{-1}X^TWy$ simplifies to $(X^TX)^{-1}X^Ty$?

Best Answer

You can indeed do regular OLS and compute the variance of the estimator (and that estimator will be unbiased and consistent).

But, GLS will be a more efficient estimator that has a lower variance of the sampling distribution (in fact out of all unbiased linear estimators, it will be the estimator with the least possible variance).


Example

Let us estimate $\mu$ with the following variables $$X_k \sim N\left(\mu, \sigma^2 \cdot k\right)$$

Then

$$\begin{array}{} \hat\mu_\text{OLS}& =&\frac{1}{n} \sum_{k=1}^n {X_k} &\sim& N\left(\mu, \sigma^2\cdot{\frac{1+1/n}{2}}\right)\\ \hat\mu_\text{GLS}& =&\frac{1}{H_{n,0.5}}\sum_{k=1}^n \frac{1}{\sqrt{k}} {X_k} &\sim& N\left(\mu, \sigma^2\cdot{\frac{n}{(H_{n,0.5})^2}}\right) \end{array}$$

where $H_{n,0.5} = \sum_{k=1}^n \frac{1}{k^{0.5}}$

The variance of the GLS estimator is smaller than the variance of the OLS estimator. So if you can reasonably guess the covariance matrix of the error distribution $\Sigma$ this may be beneficial.

example comparison OLS and GLS

Related Question