Solved – Least squares regression when data has error bars

generalized-least-squaresleast squaresregressionstandard error

Suppose I have some data $(x_i,y_i)$. If we perform ordinary least squares, we can get standard errors of the slope and intercept using estimates like $\hat{Var}(\hat{\beta}) = \hat{\sigma}^2 (X^\top X)^{-1}$ (see here and here).

However, suppose each $y_i$ came with a standard error $s_i$, possibly from some previous procedure. How would we obtain standard errors of the ordinary least squares slope and intercept that account for the standard errors from $s_i$? Or should we use a different regression technique?

Best Answer

Assume regression model: $$ y_i = \mathbf{x}'_i \boldsymbol{\beta} + \epsilon_i$$

Let $\boldsymbol{\epsilon}$ be your vector of error terms.

If you know that $ \mathrm{Var}\left(\boldsymbol{\epsilon} \right) = \Omega$, for example:

$$ \Omega = \begin{bmatrix} \sigma^2_1 & 0 & 0& \ldots&0\\0 &\sigma^2_2 & 0 & \ldots&0\\ 0 & 0 &\sigma^2_3& \ldots & 0\\\ldots&\ldots&\ldots &\ldots&0\\0&0&0&0&\sigma^2_n\end{bmatrix} $$ Then you can more efficiently estimate $\boldsymbol{\beta}$ using generalized least squares.

The estimator $\hat{\mathbf{b}}$ for GLS is given by:

$$\hat{\mathbf{b}} = \left(X'\Omega^{-1} X \right)^{-1}\left(X'\Omega^{-1} \mathbf{y} \right) $$

The basic idea with GLS is to give observations that are more precisely observed higher weight. The danger of this approach of course is that if $\Omega$ is not correct, you can end up with something far worse than the equal weighting of regular OLS.

Note also that weighted least squares (as what would occur for this $\Omega$), is a special case of GLS.

If you just want to use OLS estimation but calculate standard errors assuming you know $\mathrm{Var}\left(\boldsymbol{\epsilon}\right)$

\begin{align*} \mathrm{Var}\left( \hat{\mathbf{b}}_{OLS} \right) &= \mathrm{Var}\left( \left(X'X \right)^{-1}X'\left(X\mathbf{\beta} + \boldsymbol{\epsilon} \right) \right)\\ &=\left(X'X \right)^{-1}X' \mathrm{Var}\left( \boldsymbol{\epsilon} \right) X \left(X'X \right)^{-1} \\ &=\left(X'X \right)^{-1}X' \Omega X \left(X'X \right)^{-1} \end{align*}

The variance of the GLS estimator is given by:

\begin{align*} \mathrm{Var}\left( \hat{\mathbf{b}}_{GLS} \right) &= \mathrm{Var}\left( \left(X'\Omega^{-1} X \right)^{-1}X'\Omega^{-1}\left(X\mathbf{\beta} + \boldsymbol{\epsilon} \right) \right)\\ &=\left(X'\Omega^{-1}X \right)^{-1}X'\Omega^{-1} \mathrm{Var}\left( \boldsymbol{\epsilon} \right) \Omega^{-1} X \left(X'\Omega^{-1}X \right)^{-1} \\ &=\left(X'\Omega^{-1}X \right)^{-1}X'\Omega^{-1} \Omega\Omega^{-1} X \left(X'\Omega^{-1}X \right)^{-1} \\ &=\left(X'\Omega^{-1}X \right)^{-1}X'\Omega^{-1} X \left(X'\Omega^{-1}X \right)^{-1}\\ &=\left(X'\Omega^{-1}X \right)^{-1} \end{align*}

Related Question