Solved – Example of OLS vs GLS with AR1 residuals for teaching in R

autocorrelationgeneralized-least-squarestime series

I'm looking for an example to show my class. We are covering OLS vs GLS with autocorrelated errors — I've got the class to the point where they understand (some of them) why the the standard errors on the coefficient estimates tend to be underestimated (and the t-scores overestimated) when the autocorrelations of the errors at low lags are positive. But it would be nice to have an example where they can see how inference might change based on fitting a “better'' model with gls.

Thus, I'd like to show an example where the residuals from an OLS model y~x have an AR1 structure that would give spurious results (std errors on the estimates, p-values, etc.). Then I would show them a gls implementation that accounts for correlated residuals. E.g.,

# example of what'd I'd like given the right x and y 
ols1 <- lm(y~x) #ols model
summary(ols1) # would show a biased model
acf(residuals(ols1)) # would show AR1 structure
# vs
gls1 <- gls(y~x,correlation = corAR1())

Any ideas out there?

Edited for clarity.

Best Answer

Here is a little Monte Carlo simulation that illustrates that you will reject the null of a mean of zero way too often if you neglect the AR(1) autocorrelation in the time series generated , i.e., if you estimate the variance of the mean as $s^2/n$. GLS produces rejection rates closer to the nominal level. (I am not so familiar with the nlme package, so I hope I have used it wisely.)

library(nlme)

reps <- 1000
n <- 300

ols.rejections <- gls.rejections <- rep(NA,reps)

for (i in 1:reps){
  y <- arima.sim(n=n,list(ar=0.5))
  ols.reg <- lm(y~1)
  gls.reg <- gls(y~1,correlation = corAR1(form = ~ 1))
  ols.rejections[i] <- summary(ols.reg)$coefficients[1,4] < 0.05
      gls.rejections[i] <- summary(gls.reg)$tTable[,"p-value"] < 0.05
}

The result:

> mean(ols.rejections)
[1] 0.264

> mean(gls.rejections)
[1] 0.046

Related Solutions

GLS Prediction – Methods for Prediction with Generalized Least Squares (GLS)

Suppose we have a GLS model:

$$y=X\beta+u,$$

with

$$Euu'=\Omega.$$

Suppose we want to predict $y^*$:

$$y^*=x^*\beta+u^*,$$

Goldberger proved that the best linear unbiased prediction for $y^*$ is the following:

$$\hat{y}=x^*\hat{\beta}+w'\Omega^{-1}\hat{u},$$

where

$$\hat\beta=(X'\Omega^{-1}X)^{-1}X\Omega^{-1}y,\quad \hat{u}=y-X\hat\beta$$

and

$$w=Eu^*u$$

So the answer to your first question would be that if you use simple prediction, then your prediction will not be optimal. On the other hand to use this formula you need to know $w$. And for that you need to know more about $\Omega$. Goldberger in his article discusses several special cases.

As for your second question it is a bit unclear for me what are you trying to achieve. The problem with GLS model is that if we use OLS standard errors of the coefficients then they are biased. The formulas you give are for calculating the standard error of the error term. But this only makes sense for OLS model, since for GLS model the error term in general will not have unique variance.

If you are going for prediction variance, then @whuber comment holds, you cannot calculate it in this setup. The basic problem for that is you predict one observation, so you get one number. And variance of one number is zero. What you can calculate is theoretical prediction variance, but this then depends on the model you are trying to test.

If you want to calculate PRESS: the sum of squares of residuals from jackknife procedure and weight them with $\Omega$, I think you will run into the same problem of how to calculate $\Omega$ out of sample.

Solved – Non-Correlated errors from Generalized Least Square model (GLS)

The residuals from gls will indeed have the same autocorrelation structure, but that does not mean the coefficient estimates and their standard errors have not been adjusted appropriately. (There's obviously no requirement that $\Omega$ be diagonal, either.) This is because the residuals are defined as $e = Y - X\hat{\beta}^{\text{GLS}}$. If the covariance matrix of $e$ was equal to $\sigma^2\text{I}$, there would be no need to use GLS!

In short, you haven't done anything wrong, there's no need to adjust the residuals, and the routines are all working correctly.

predict.gls does take the structure of the covariance matrix into account when forming standard errors of the prediction vector. However, it doesn't have the convenient "predict a few observations ahead" feature of predict.Arima, which takes into account the relevant residuals at the end of the data series and the structure of the residuals when generating predicted values. arima has the ability to incorporate a matrix of predictors in the estimation, and if you're interested in prediction a few steps ahead, it may be a better choice.

EDIT: Prompted by a comment from Michael Chernick (+1), I'm adding an example comparing GLS with ARMAX (arima) results, showing that coefficient estimates, log likelihoods, etc. all come out the same, at least to four decimal places (a reasonable degree of accuracy given that two different algorithms are used):

# Generating data
eta <- rnorm(5000)
for (j in 2:5000) eta[j] <- eta[j] + 0.4*eta[j-1]
e <- eta[4001:5000]
x <- rnorm(1000)
y <- x + e

> summary(gls(y~x, correlation=corARMA(p=1), method='ML'))
Generalized least squares fit by maximum likelihood
  Model: y ~ x 
  Data: NULL 
       AIC      BIC    logLik
  2833.377 2853.008 -1412.688

Correlation Structure: AR(1)
 Formula: ~1 
 Parameter estimate(s):
      Phi 
0.4229375 

Coefficients:
                 Value  Std.Error  t-value p-value
(Intercept) -0.0375764 0.05448021 -0.68973  0.4905
x            0.9730496 0.03011741 32.30854  0.0000

 Correlation: 
  (Intr)
x -0.022

Standardized residuals:
        Min          Q1         Med          Q3         Max 
-2.97562731 -0.65969048  0.01350339  0.70718362  3.32913451 

Residual standard error: 1.096575 
Degrees of freedom: 1000 total; 998 residual
> 
> arima(y, order=c(1,0,0), xreg=x)

Call:
arima(x = y, order = c(1, 0, 0), xreg = x)

Coefficients:
         ar1  intercept       x
      0.4229    -0.0376  0.9730
s.e.  0.0287     0.0544  0.0301

sigma^2 estimated as 0.9874:  log likelihood = -1412.69,  aic = 2833.38

EDIT: Prompted by a comment from anand (OP), here's a comparison of predictions from gls and arima with the same basic data structure as above and some extraneous output lines removed:

df.est <- data.frame(list(y = y[1:995], x=x[1:995]))
df.pred <- data.frame(list(y=NA, x=x[996:1000]))

model.gls <- gls(y~x, correlation=corARMA(p=1), method='ML', data=df.est)
model.armax <- arima(df.est$y, order=c(1,0,0), xreg=df.est$x)

> predict(model.gls, newdata=df.pred)
[1] -0.3451556 -1.5085599  0.8999332  0.1125310  1.0966663

> predict(model.armax, n.ahead=5, newxreg=df.pred$x)$pred
[1] -0.79666213 -1.70825775  0.81159072  0.07344052  1.07935410

As we can see, the predicted values are different, although they are converging as we move farther into the future. This is because gls doesn't treat the data as a time series and take the specific value of the residual at observation 995 into account when forming predictions, but arima does. The effect of the residual at obs. 995 decreases as the forecast horizon increases, leading to the convergence of predicted values.

Consequently, for short-term predictions of time series data, arima will be better.

Best Answer

Related Solutions

GLS Prediction – Methods for Prediction with Generalized Least Squares (GLS)

Solved – Non-Correlated errors from Generalized Least Square model (GLS)

Related Question