Regression – Covariance Matrix of the Residuals in Linear Regression Model

covariancecovariance-matrixeconometricsregressionresiduals

I estimate the linear regression model:

$Y = X\beta + \varepsilon$

where $y$ is an ($n \times 1$) dependent variable vector, $X$ is an ($n \times p$) matrix of independent variables, $\beta$ is a ($p \times 1$) vector of the regression coefficients, and $\varepsilon$ is an ($n \times 1$) vector of random errors.

I want to estimate the covariance matrix of the residuals. To do so I use the following formula:

$Cov(\hat{\varepsilon}) = \sigma^2 (I-H)$

where $\hat{\varepsilon}=Y-X\hat{\beta}$, $\sigma^2$ is estimated by $\hat{\sigma}^2 = \frac{e'e}{n-p}$, $I$ is an identity matrix, and $H = X(X'X)^{-1}X$ is a hat matrix.

However, in some source I saw that the covariance matrix of the residuals is estimated in other way.
The residuals are assumed to follow $AR(1)$ process:

$\varepsilon_t = \rho \varepsilon_{t-1} + \eta_t$

where $E(\eta) = 0$ and $Var({\eta}) = \sigma^2_{0}I$.

The covariance matrix is estimated as follows

$Cov(\varepsilon) = \sigma^2 \begin{bmatrix}
1 & \rho & \rho^2 & … & \rho^{n-1}\\
\rho & 1 & \rho & … & \rho^{n-2} \\
… & … & … & … & … \\
\rho^{n-1} & \rho^{n-2} & … & … & 1
\end{bmatrix}$

where $\sigma^2 = \frac{1}{1-\rho^2}\sigma^2_0$

My question is are there two different specifications of the covariance matrix of residuals or these are somehow connected with each other?

Best Answer

After some investigation, I think I found a small (but crucial!) imprecision in what your post.

The first formula you wrote : $var(\varepsilon) = \sigma^2 (I - H)$ is actually not totally exact. The formula should be $var(\hat \varepsilon) = \sigma ^2 (I - H)$ where $\hat\varepsilon = Y - \hat\beta X$ considering the OLS estimator $\hat\beta = (X^TX)^{-1}X^TY$. Thus $\hat\sigma(I - H)$ is an estimator of the variance of the estimated residuals associated with OLS estimator. This formula does not suppose independance of the $\varepsilon_i$, just that they all have same variance $\sigma^2$. But this is not what you want! You want an estimate of the variance of the true residuals, not the estimated residuals under OLS estimation. OLS estimator corresponds to maximum likelihood estimator under the hypothesis that residuals are i.i.d. and normal. The estimated residuals can thus be very poor estimates of the true residuals if these hypothesis are not met, and there covariance matrix can be very different from the covariance of the true residuals.

The second formula you wrote does correspond to the covariance matrix of the $\varepsilon_i$ under the hypothesis that they follow an AR(1) process.

Estimating covariance matrix of the residuals of a linear regression without any asumption cannot easily be done: you would have more unknown than datapoints... So you need to specify some form for the covariance matrix of the residuals. Supposing that they follow an AR(1) process (if this is relevent) is a way of doing so. You can also assume that they have a stationnary parametrized autocorrelation function, whose parameters you can estimate, and use it to deduce the covariance matrix.

Related Question