Solved – How to compute robust standard errors of the coefficients in multiple regression

linear algebramultiple regressionregressionrobust-standard-error

So I know that to find the coefficients of the BLP of some data is to use the formula,

$$\vec{\beta} = [{\bf X}^{T}{\bf X}]^{-1}{\bf X}^{T}{\bf Y}.$$

However, I also want to find the variance, and I see this formula, classified as "non-parametric" and "robust" one:

$$\hat{V}[\vec{\hat{\beta}}] = [{\bf X}^{T}{\bf X}]^{-1}{\bf X}^{T}\text{diag}[({\bf Y}-{\bf X}\vec{\beta})^{2}]{\bf X}[{\bf X}^{T}{\bf X}]^{-1}.$$

However, the equation that I have comes from some notes that don't explain all of what is meant here. First of all, I don't understand what is meant by $({\bf Y}-{\bf X}\vec{\beta})^{2}$ because the stuff inside the power is a vector–how do you square a vector? Coordinate-wise?

Moreover, I'm not sure what is meant by "diag". Is that the diagonalization of the matrix, i.e. the diagonal matrix of its eigenvalues? I'm computing this in R and R has a canned command for selecting the diagonal elements from a matrix–is that what this is supposed to be?

I have another confusion about this. If you use this for a BLP with two variable terms and collect 10 data points, for instance, then $\bf X$ will have dimensions 10×3 and since ${\bf X}^{T}$ is to the left of the diag(…) factor, this must have 3 rows. Since $\bf X$ occurs to its right, then the diag(…) factor must have 3 columns. But in that case, the result of this computation will be a $3 \times 3$ matrix. Since it's supposed to tell me the variance of the coefficients, I don't see how I would interpret this result.

Best Answer

The expression you have is for White's heteroskedasticity-robust variance estimator. The expression in the central parantheses gives the residual vector, and the squaring is done element-wise (so that you get the vector of squared residuals). Diag means creates the diagonal matrix which has the vector of squared residuals along it's diagonal, and zeroes as the off-diagonal elements.

As for your second question, you're right that the variance matrix will be a $k \times k$ matrix, where $k$ is the number of parameters (including the constant). The diagonal elements of this matrix give the variances of the parameter estimates, while the off-diagonal elements give the covariances between the different parameter estimates. This matrix is more properly called a variance-covariance matrix.

Edit: the standard OLS model requires a homoskedasticity assumption, i.e., that the variance of the error term is independent of the covariates. The robust estimator discussed above relaxes this assumption, allowing for heteroskedastic errors. This is why the robust estimator includes the full vector of squared residuals, while the standard OLS variance estimator simply uses the overall variance of the residuals.

Related Question