Solved – How to calculate the regression variance for a GLS model

generalized-least-squarespredictionrresidualsvariance

I need to calculate the regression variance ($\sigma^2$) in order to estimate both the confidence intervals and the prediction intervals in a gls regression analysis. For the analysis, the covariance matrix ($V$) of the response variable ($y$) is known in advance, and so I use it directly as the weighting matrix (=$V^{-1}$) in the gls regression analysis.

The regression variance is a weighted sum of the residual error:
$\sigma^2 = \frac{ (Y – X\beta)^T C^{-1} (Y – X\beta)}{n – p}$

My question/problem is how to determine the weighting matrix $C^{-1}$? $C$ cannot be set equal to $V$ since (according to the above equation) $C$ must be dimensionless while $V$ has the same units as $\sigma^2$.

Based on my reading of the literature and available texts, it seems that $C$ is the correlation matrix and is a scaled or normalized form of the covariance matrix $V$. i.e., $V = Var(\epsilon^2) = \sigma^2 C$. But my problem is that $\sigma^2$ is not yet known, and so I need another way find $C$ from $V$.

R functions such as gls() will compute the regression variance (if I knew how gls() does this, it would answer my question). However I cannot use gls() in this case since I am specifying a user-defined covariance (weighting) matrix, and gls() only accepts a limited set of specific correlation structures.

In fact a possible solution can be found in this earlier post where an equation for the SEE (or sigma2) for a GLS regression was cited :

GLS calc of SEE: sqrt( sum( ( residuals from linear model) ^ 2 * glsWeight ) ) / sum( glsWeight ) * length( glsWeight ) / residualDegreeFreedom )

However I am unable to ascertain the validity of this equation and cannot find its source reference.

Best Answer

You are a bit off I think because the formula for epsilon you seem to use relies on the neccesity to decompose your error variance into a constant term ($\sigma^2$) and a correlation term $\Omega$, which is the $C$ in your post. You actually have to use this $C$ as a weight for the regression. Let me explain.

It is assumed you know $\Omega$, otherwise you have to do FGLS instead. The formula you have is a result of a transformed model. First, let's look at our GLS estimators:
Under the assumption that $V[\epsilon|X]=\sigma^2\Omega$ you can transform the model and estimate the following $$\widehat{\beta}^{GLS} = [X'\Omega^{-1}X]^{-1}X'\Omega^{-1}y$$ This, generally, is a BLUE estimator under the assumptions.
Which has, under the above and usual assumptions this variance $$V[\widehat{\beta}^{GLS}] = \sigma^2[X'\Omega^{-1}X]^{-1} $$ Since you know $\Omega$, you can now calculate everything and estimate the $\sigma^2$. To understand how have a look at how the GLS estimator is reached.
You reached this result by applying a matrix $P$ to the model formula as such $$PY = PX\beta + P\epsilon$$ What is P? If you know omega, you can calculate the $P$ (Cholesky decomposition) with $$ P'P = \Omega^{-1}$$ And then you have a consistent estimator in $$ \widehat{\sigma}^2 = \frac{(P\hat{\epsilon}^{OLS})'P\hat{\epsilon}^{OLS}}{n-k} \\ = \frac{ \hat{\epsilon}'^{OLS} \Omega^{-1} \hat{\epsilon}^{OLS}}{n-k} $$ This is the formula you have listed, but it implies a possible decomposition $V[\epsilon|X]=\sigma^2\Omega$. Note how it uses the residuals of OLS. This is because of said existence of a transformed and non-transformed model.

You said you used $V[Y]=V$ as weight for GLS. Can you tell us the exact formula? Is it $$\widehat{\beta}^{GLS} = [X'V^{-1}X]^{-1}X'V^{-1}y$$ ?
In that case you used $V$ instead of $C$. You mentioned that you think $V[\epsilon]=V$ and $V=\sigma^2C$ (I assume this is what you mean).
You could go ahead and plug in $V$ instead of $C$. But of course this is not actually correct because $V$ is not equal to $C$ as you discovered.
This seems to be your problem.

If you know nothing about the composition of your error terms, you just can't get to that decomposition and the $C$ you need. That is the 'point' of GLS. It takes advantage of a known correlation structure $\Omega$ or $C$ to regain BLUE estimators.
So the equation you can not validate is the result of said transformed model. For this you also need to have your $\Omega$ or $C$ as you called it and by that the $P$ to transform the model.
If you do not have this, use FGLS instead.

Related Solutions

GLS Prediction – Methods for Prediction with Generalized Least Squares (GLS)

Suppose we have a GLS model:

$$y=X\beta+u,$$

with

$$Euu'=\Omega.$$

Suppose we want to predict $y^*$:

$$y^*=x^*\beta+u^*,$$

Goldberger proved that the best linear unbiased prediction for $y^*$ is the following:

$$\hat{y}=x^*\hat{\beta}+w'\Omega^{-1}\hat{u},$$

where

$$\hat\beta=(X'\Omega^{-1}X)^{-1}X\Omega^{-1}y,\quad \hat{u}=y-X\hat\beta$$

and

$$w=Eu^*u$$

So the answer to your first question would be that if you use simple prediction, then your prediction will not be optimal. On the other hand to use this formula you need to know $w$. And for that you need to know more about $\Omega$. Goldberger in his article discusses several special cases.

As for your second question it is a bit unclear for me what are you trying to achieve. The problem with GLS model is that if we use OLS standard errors of the coefficients then they are biased. The formulas you give are for calculating the standard error of the error term. But this only makes sense for OLS model, since for GLS model the error term in general will not have unique variance.

If you are going for prediction variance, then @whuber comment holds, you cannot calculate it in this setup. The basic problem for that is you predict one observation, so you get one number. And variance of one number is zero. What you can calculate is theoretical prediction variance, but this then depends on the model you are trying to test.

If you want to calculate PRESS: the sum of squares of residuals from jackknife procedure and weight them with $\Omega$, I think you will run into the same problem of how to calculate $\Omega$ out of sample.

Solved – Equivalence of the OLS and GLS estimates

Question: In the setup above, are conditions (1) and (2) satisfied?

Answer: No, in general the conditions are not satisfied.

The following example provides a proof of the answer.

\begin{align*} X &= \begin{bmatrix} 1 & 0 \\ 1 & 0 \\ 0 & 1 \\ 0 & 1 \\ \end{bmatrix}, Y = \begin{bmatrix} 1 \\2 \\3\\4 \end{bmatrix}, \Sigma = \begin{bmatrix} 1 & 0&0&0 \\ 0&5&0&0 \\ 0&0&5&0\\ 0&0&0&5 \end{bmatrix}. \end{align*}

Notice that $\Sigma, X'X$ and $X'\Sigma^{-1}X$ are all diagonal matrices with non-zero, positive, elements on the diagonals. Thus, they are all positive definite and have the standard basis vectors as eigenvectors. That is, they satisfy the setup and condition 1). It is easy to check that the OLS and GLS estimates are different (see code below). Thus, condition 2) must not hold. Let's see why.

In this example, $k=2$ so the columns of $H$ are two eigenvectors of $\Sigma$. Let $A=[a_1, a_2]$. Then $X = HA$ implies that $Ha_1 = x_1 = [1,1,0,0]'$. The eigenvectors of $\Sigma$ are the standard basis vectors, say $e_i$, and, thus, it must be that $H = [e_1, e_2]$ up to reordering of the columns. But then $x_2 =[0,0,1,1]' \notin \mathrm{span}(H)$, i.e. we cannot pick $a_2$ to satisfy the requirement that $X=HA$. We conclude condition 2) is not satisfied.

The following R code snippet shows that the GLS estimates, in this case WLS because of the diagonal covariance matrix, differ from the OLS estimates.

X <- matrix(c(1,1,0,0,0,0,1,1), ncol = 2); Y <- 1:4; E <- diag(c(1, 5, 5, 5))
coef(lm(Y ~ X - 1)
>X1  X2 
>1.5 3.5

coef(lm(Y ~ X - 1, weights = 1/diag(E)))
>X1      X2 
>1.66667 3.50000

Best Answer

Related Solutions

GLS Prediction – Methods for Prediction with Generalized Least Squares (GLS)

Solved – Equivalence of the OLS and GLS estimates

Related Question