Solved – OLS: Variance Covariance matrix of residuals

While reviewing the basics of ordinary least squares (OLS) regression in a matrix context, I stumbled upon the variance covariance matrix of the residuals. In an OLS context, the population model is usually assumed to be (for a cross-section of data):

$
\begin{aligned}
& y_i = \textbf{x}_i' \beta + u_i.
\end{aligned}
$

$y$ is then an n$\times$1 vector with the dependent variable, $\textbf{X}$ is an n$\times$k matrix of exogenous variables and $u$ is the n$\times$1 vector of residuals. In order to derive the distribution of the OLS estimator, the following proposition about the residuals is made by Cameron & Trivedi:

Proposition: Data are independent over $i$ with $\text{E}\left({\textbf{u}|\textbf{X}}\right) = \textbf{0} $ and $\text{E}\left({\textbf{uu'}|\textbf{X}}\right) = \bf{\Omega}$$ = $ $\text
{Diag}\left({\sigma_i^2}\right)$
_{Cameron, Trivedi (2005): p. 73}

So far, I only ever thought about homoskedasticity, which implies that $\text{E}(u^2) = \sigma^2$. But Cameron & Trivedi make their proposition more general, in that they also allow for heteroskedastic errors with variance $\sigma_i^2$ for different $i$. I knew the concepts of homoskedasticity and heteroskedasticity before, but the way that Cameron & Trivedi frame this proposition makes you think about the distribution of the errors in a matrix way. Clearly, $\bf{\Omega}$ is an $n \times n$ matrix, but when I thought about it, I had troubles understanding the single entries in this variance covariance matrix.

Looking at the diagonal elements of $\bf{\Omega}$: How does $u_1$, for instance, have a variance? We are in a cross-sectional setting, so $u_1$ is essentially one observation point. So how do I obtain the variance for a variable with one observation? The same goes for the off-diagonal elements: How come there exists a covariance for 2 single observations?

Note:

The source is:
Cameron, Trivedi (2005): Microeconometris: Methods and Applications.

Best Answer

How does (a point) $u_{i=1,...,n}$ have a variance?

For sure this can be really confusing at first.

But actually, what you need to understand is that $u_1$ is likely to have many other possible values, and thus a variance. And I am not talking about other components of the $n \times 1$ vector $\boldsymbol{u}$, but really about the variance of each of its $n$ components.

If you look at the definition of the covariance matrix, you will see that the covariance between two components of $\boldsymbol{u}$, say the $i$th and the $j$th ones, is

$\mathrm{E}(\boldsymbol{u}\boldsymbol{u}') = [\mathrm{cov}(u_i,u_j)] = [\mathrm{E}(u_i u_j) - \mathrm{E}(u_i) \mathrm{E}(u_j)]$

where $\mathrm{E}(u_i)$ stands for the average of all the values that $u_i$ can have, classically equal to $0$. Idem for $\mathrm{E}(u_j)$. And $\mathrm{E}(u_i u_j)$ stands for the average of all the values that their product can have ! Incidentally, the variance is a particular case of covariance in which $i=j$.

Thus, to conclude, for sure you only get point values of $u_i \forall i\in [1,n]$, but each of this point is actually randomly picked up given an underlying distribution, be it empirically observed/computed (computed using bootstrapping or Bayesian technics) or theoretically derived/assumed.

Best Answer

Related Solutions

Solved – When are the asymptotic variance of OLS and 2SLS equal

Solved – Variance-Covariance Matrix of the errors of a linear regression

Related Question