GLS Prediction – Methods for Prediction with Generalized Least Squares (GLS)

forecastinggeneralized-least-squaresregression

Let's say I build a Generalized Least Squares model. I follow the standard procedure and first estimate a LM model. Then I create an error-response covariance matrix based on the residuals of this model. Now I build an LM model again only this time I specify weights based on the error-response covariance matrix.

Now suppose I want to predict with the GLS model out-of-sample to test for model stability. I want to confirm that I can simply perform a prediction using the coefficients estimated by GLS and there is no need to furnish weights anymore (especially since in a prediction scenario where residuals are not available the error-response covariance matrix cannot be generated).

Follow-up question:

We proceed to score on test data with coefficients from the training data. (The dimension of the test data consists of a cross-section of N individuals and T observations.) We would like to produce consistent standard errors. Therefore, instead of calculating standard error of the estimate in the OLS fashion, we weight the residuals (see below) by a "GLS weights" vector:

OLS calc of SEE: sqrt( sum( ( residuals from linear model ) ^ 2 ) ) / residualDegreeFreedom )

GLS calc of SEE: sqrt( sum( ( residuals from linear model) ^ 2 * glsWeight ) ) / sum( glsWeight ) * length( glsWeight ) / residualDegreeFreedom )

"gls weight" is a vector calculated in the usual way as the inverse of the variance of the residuals of each cross-section at a date (i.e. a vector of length T). However, here I am using the residuals from the test data as opposed to the training data (indeed this is required otherwise the dimension of out-of-time residuals would not match the dimension of GLS weights vector).

What is counter-intuitive is that if I want to measure the SEE of the GLS model out-of-sample on one individual, I am required to score all individuals out of sample (otherwise constructing the GLS weights vector would be impossible since there is no variance of residuals).

Question is – Am I required to use the GLS weights when calculating the SEE out-of-sample, or can I simply use the OLS calculation of SEE?

Best Answer

Suppose we have a GLS model:

$$y=X\beta+u,$$

with

$$Euu'=\Omega.$$

Suppose we want to predict $y^*$:

$$y^*=x^*\beta+u^*,$$

Goldberger proved that the best linear unbiased prediction for $y^*$ is the following:

$$\hat{y}=x^*\hat{\beta}+w'\Omega^{-1}\hat{u},$$

where

$$\hat\beta=(X'\Omega^{-1}X)^{-1}X\Omega^{-1}y,\quad \hat{u}=y-X\hat\beta$$

and

$$w=Eu^*u$$

So the answer to your first question would be that if you use simple prediction, then your prediction will not be optimal. On the other hand to use this formula you need to know $w$. And for that you need to know more about $\Omega$. Goldberger in his article discusses several special cases.

As for your second question it is a bit unclear for me what are you trying to achieve. The problem with GLS model is that if we use OLS standard errors of the coefficients then they are biased. The formulas you give are for calculating the standard error of the error term. But this only makes sense for OLS model, since for GLS model the error term in general will not have unique variance.

If you are going for prediction variance, then @whuber comment holds, you cannot calculate it in this setup. The basic problem for that is you predict one observation, so you get one number. And variance of one number is zero. What you can calculate is theoretical prediction variance, but this then depends on the model you are trying to test.

If you want to calculate PRESS: the sum of squares of residuals from jackknife procedure and weight them with $\Omega$, I think you will run into the same problem of how to calculate $\Omega$ out of sample.