Solved – Fitting a generalized least squares model with correlated data; use ML or REML

generalized-least-squaresmaximum likelihoodrtime series

Reading the Linear Mixed Model (LMM) literature I am aware that fitting a model using REML provides better estimates of variance parameters than fitting via ML. However, we should not compare nested models fitted with REML that have different fixed effects.

Recently, I have been fitting some models using GLS via the gls() function in the nlme package for R. The default fitting method for that function is REML. Do the same principals of REML vs ML for LMM also apply to GLS?

Specifically, I am fitting a model with and without a linear trend with a correlation structure in the residuals:

m1 <- gls(Response ~ Time, data = foo, correlation = corAR1(form ~ Time))
m0 <- gls(Response ~ 1, data = foo, correlation = corAR1(form ~ Time))

In the above, I should fit the models using ML as they have different fixed effects. Is this correct?

Secondly, consider two GLS models that differ in the correlation structure:

m1 <- gls(Response ~ Time, data = foo, correlation = corARMA(form ~ Time, p = 1))
m2 <- gls(Response ~ Time, data = foo, correlation = corARMA(form ~ Time, p = 2))

What fitting method should ideally be used here? REML or ML? Here my intuition would say fit via REML as we are estimating (co)variance parameters. Is my intuition correct or have I got this all mixed up?

Best Answer

Your intuition is correct, the same principles apply. I looked in Pinheiro/Bates section 5.4, where gls is introduced, but it doesn't say so explicitly, so you'll just have to trust me, I guess. :)

In Chapter 2 they go through the theory of REML and ML and you'll notice that none of the theory depends on there being any random effects, and that actually, you could write any random effect model using just correlation structure instead and fit with gls, though for complex random effects it would be quite complex. The simplest example is that a random intercept model is equivalent to a compound symmetry model.

Related Solutions

Solved – use generalised least squares with a binomial distribution and a nested structure

By GLS do you mean GLM? The GLM is a method of iteratively reweighted least squares which takes the mean-variance relationship into account when estimating the model parameters. Generalized Least Squares will still either suffer from an overfitting issue (infinite weights), or overprediction (fitted probabilities greater than 1 or less than 0). The logistic regression model is commonly used to test for associations with binary outcomes. It's possible to go further and use Generalized Linear Mixed Models (GLMMs), conditional logistic regression, or Generalized Estimating Equations (GEEs) to account for certain correlation structures in the data. The nlme package has the mixed models, survival has clogit, and the geese package for GEEs.

Solved – Non-Correlated errors from Generalized Least Square model (GLS)

The residuals from gls will indeed have the same autocorrelation structure, but that does not mean the coefficient estimates and their standard errors have not been adjusted appropriately. (There's obviously no requirement that $\Omega$ be diagonal, either.) This is because the residuals are defined as $e = Y - X\hat{\beta}^{\text{GLS}}$. If the covariance matrix of $e$ was equal to $\sigma^2\text{I}$, there would be no need to use GLS!

In short, you haven't done anything wrong, there's no need to adjust the residuals, and the routines are all working correctly.

predict.gls does take the structure of the covariance matrix into account when forming standard errors of the prediction vector. However, it doesn't have the convenient "predict a few observations ahead" feature of predict.Arima, which takes into account the relevant residuals at the end of the data series and the structure of the residuals when generating predicted values. arima has the ability to incorporate a matrix of predictors in the estimation, and if you're interested in prediction a few steps ahead, it may be a better choice.

EDIT: Prompted by a comment from Michael Chernick (+1), I'm adding an example comparing GLS with ARMAX (arima) results, showing that coefficient estimates, log likelihoods, etc. all come out the same, at least to four decimal places (a reasonable degree of accuracy given that two different algorithms are used):

# Generating data
eta <- rnorm(5000)
for (j in 2:5000) eta[j] <- eta[j] + 0.4*eta[j-1]
e <- eta[4001:5000]
x <- rnorm(1000)
y <- x + e

> summary(gls(y~x, correlation=corARMA(p=1), method='ML'))
Generalized least squares fit by maximum likelihood
  Model: y ~ x 
  Data: NULL 
       AIC      BIC    logLik
  2833.377 2853.008 -1412.688

Correlation Structure: AR(1)
 Formula: ~1 
 Parameter estimate(s):
      Phi 
0.4229375 

Coefficients:
                 Value  Std.Error  t-value p-value
(Intercept) -0.0375764 0.05448021 -0.68973  0.4905
x            0.9730496 0.03011741 32.30854  0.0000

 Correlation: 
  (Intr)
x -0.022

Standardized residuals:
        Min          Q1         Med          Q3         Max 
-2.97562731 -0.65969048  0.01350339  0.70718362  3.32913451 

Residual standard error: 1.096575 
Degrees of freedom: 1000 total; 998 residual
> 
> arima(y, order=c(1,0,0), xreg=x)

Call:
arima(x = y, order = c(1, 0, 0), xreg = x)

Coefficients:
         ar1  intercept       x
      0.4229    -0.0376  0.9730
s.e.  0.0287     0.0544  0.0301

sigma^2 estimated as 0.9874:  log likelihood = -1412.69,  aic = 2833.38

EDIT: Prompted by a comment from anand (OP), here's a comparison of predictions from gls and arima with the same basic data structure as above and some extraneous output lines removed:

df.est <- data.frame(list(y = y[1:995], x=x[1:995]))
df.pred <- data.frame(list(y=NA, x=x[996:1000]))

model.gls <- gls(y~x, correlation=corARMA(p=1), method='ML', data=df.est)
model.armax <- arima(df.est$y, order=c(1,0,0), xreg=df.est$x)

> predict(model.gls, newdata=df.pred)
[1] -0.3451556 -1.5085599  0.8999332  0.1125310  1.0966663

> predict(model.armax, n.ahead=5, newxreg=df.pred$x)$pred
[1] -0.79666213 -1.70825775  0.81159072  0.07344052  1.07935410

As we can see, the predicted values are different, although they are converging as we move farther into the future. This is because gls doesn't treat the data as a time series and take the specific value of the residual at observation 995 into account when forming predictions, but arima does. The effect of the residual at obs. 995 decreases as the forecast horizon increases, leading to the convergence of predicted values.

Consequently, for short-term predictions of time series data, arima will be better.

Best Answer

Related Solutions

Solved – use generalised least squares with a binomial distribution and a nested structure

Solved – Non-Correlated errors from Generalized Least Square model (GLS)

Related Question