Solved – ARIMA (with xreg) vs GLS

arimageneralized-least-squarestime series

I am fitting both an arima model (with xreg variables) and a gls model to my data in R software. They both have the same ARMA structure and variables. The ARIMA model fits to the data better. Does anyone know what the difference between these two are? I have seen that the equation for an ARIMA model in R with xreg is a linear regression with ARMA errors. Is that the same as a linear regression with ARMA error correlation (as used by the GLS)?

Thanks!

EDIT: The following code was used to create the GLS and ARIMA models:

arima3a <- arima(train.all$sv,xreg = train.all[,c(5,6)],order=c(2,0,1))

gls3 <- gls(sv~sin+cos,data=train.all,correlation = corARMA(p=2,q=1))

Note: the sin and cos variables are equivalent to the variables 5 and 6 in the train.all matrix.

Best Answer

The two models model somewhat different things.

arima(... , xreg = ...) calculates a regression on xreg, modeling its errors as an ARIMA process. Note that this is not the same as an ARIMAX model, and that this also applies to Arima() and auto.arima().
gls(..., correlation=corARMA(p,q)) calculates a generalized linear model, where the correlation structure of your errors follows an ARMA(p,q) process.

The ideas are of course similar, but the actual models are somewhat different. I find the arima() model easier to understand. It would be interesting to compare the coefficients on both the fixed regressors and the ARIMA models for the errors resp. their correlation.

Given that both models have the same complexity (as in number of parameters, as long as ARMA orders are the same), I'd go with the better fitting one. (But remember that you can't compare AICs calculated by functions in different packages, as AIC is only defined up to a constant, which can definitely differ between packages.)

Related Solutions

Solved – How is ARMA/ARIMA related to mixed effects modeling

I think the simplest way to look at it is to note that ARMA and similar models are designed to do different things than multi-level models, and use different data.

Time series analysis usually has long time series (possibly of hundreds or even thousands of time points) and the primary goal is to look at how a single variable changes over time. There are sophisticated methods to deal with many problems - not just autocorrelation, but seasonality and other periodic changes and so on.

Multilevel models are extensions from regression. They usually have relatively few time points (although they can have many) and the primary goal is to examine the relationship between a dependent variable and several independent variables. These models are not as good at dealing with complex relationships between a variable and time, partly because they usually have fewer time points (it's hard to look at seasonality if you don't have multiple data for each season).

Solved – Non-Correlated errors from Generalized Least Square model (GLS)

The residuals from gls will indeed have the same autocorrelation structure, but that does not mean the coefficient estimates and their standard errors have not been adjusted appropriately. (There's obviously no requirement that $\Omega$ be diagonal, either.) This is because the residuals are defined as $e = Y - X\hat{\beta}^{\text{GLS}}$. If the covariance matrix of $e$ was equal to $\sigma^2\text{I}$, there would be no need to use GLS!

In short, you haven't done anything wrong, there's no need to adjust the residuals, and the routines are all working correctly.

predict.gls does take the structure of the covariance matrix into account when forming standard errors of the prediction vector. However, it doesn't have the convenient "predict a few observations ahead" feature of predict.Arima, which takes into account the relevant residuals at the end of the data series and the structure of the residuals when generating predicted values. arima has the ability to incorporate a matrix of predictors in the estimation, and if you're interested in prediction a few steps ahead, it may be a better choice.

EDIT: Prompted by a comment from Michael Chernick (+1), I'm adding an example comparing GLS with ARMAX (arima) results, showing that coefficient estimates, log likelihoods, etc. all come out the same, at least to four decimal places (a reasonable degree of accuracy given that two different algorithms are used):

# Generating data
eta <- rnorm(5000)
for (j in 2:5000) eta[j] <- eta[j] + 0.4*eta[j-1]
e <- eta[4001:5000]
x <- rnorm(1000)
y <- x + e

> summary(gls(y~x, correlation=corARMA(p=1), method='ML'))
Generalized least squares fit by maximum likelihood
  Model: y ~ x 
  Data: NULL 
       AIC      BIC    logLik
  2833.377 2853.008 -1412.688

Correlation Structure: AR(1)
 Formula: ~1 
 Parameter estimate(s):
      Phi 
0.4229375 

Coefficients:
                 Value  Std.Error  t-value p-value
(Intercept) -0.0375764 0.05448021 -0.68973  0.4905
x            0.9730496 0.03011741 32.30854  0.0000

 Correlation: 
  (Intr)
x -0.022

Standardized residuals:
        Min          Q1         Med          Q3         Max 
-2.97562731 -0.65969048  0.01350339  0.70718362  3.32913451 

Residual standard error: 1.096575 
Degrees of freedom: 1000 total; 998 residual
> 
> arima(y, order=c(1,0,0), xreg=x)

Call:
arima(x = y, order = c(1, 0, 0), xreg = x)

Coefficients:
         ar1  intercept       x
      0.4229    -0.0376  0.9730
s.e.  0.0287     0.0544  0.0301

sigma^2 estimated as 0.9874:  log likelihood = -1412.69,  aic = 2833.38

EDIT: Prompted by a comment from anand (OP), here's a comparison of predictions from gls and arima with the same basic data structure as above and some extraneous output lines removed:

df.est <- data.frame(list(y = y[1:995], x=x[1:995]))
df.pred <- data.frame(list(y=NA, x=x[996:1000]))

model.gls <- gls(y~x, correlation=corARMA(p=1), method='ML', data=df.est)
model.armax <- arima(df.est$y, order=c(1,0,0), xreg=df.est$x)

> predict(model.gls, newdata=df.pred)
[1] -0.3451556 -1.5085599  0.8999332  0.1125310  1.0966663

> predict(model.armax, n.ahead=5, newxreg=df.pred$x)$pred
[1] -0.79666213 -1.70825775  0.81159072  0.07344052  1.07935410

As we can see, the predicted values are different, although they are converging as we move farther into the future. This is because gls doesn't treat the data as a time series and take the specific value of the residual at observation 995 into account when forming predictions, but arima does. The effect of the residual at obs. 995 decreases as the forecast horizon increases, leading to the convergence of predicted values.

Consequently, for short-term predictions of time series data, arima will be better.

Best Answer

Related Solutions

Solved – How is ARMA/ARIMA related to mixed effects modeling

Solved – Non-Correlated errors from Generalized Least Square model (GLS)

Related Question