I'm to produce rolling forecasts with an ARIMA-GARCH model using a moving window size of 1000. Given that structural changes in the series might take place at some point in the forecast horizon, is there any conventional method for choosing the lags of the ARIMA(p,q)-model when using a rolling window? I'm considering a rolling approach using the `auto.arima`

function from the `forecast`

package to update the optimal lag length daily by minimizing the AIC criterion. However, it appears that this method produces rather volatile results in terms of optimal lags which seems a bit suspicious. Any advise on this would be highly appreciated!

# Forecasting – Lag Selection and Model Instability for ARIMA-GARCH in Rolling Windows

arimaforecastinggarchmodel selectionmoving window

#### Related Solutions

I'm wondering if a rolling forecast technique like the ones mentioned in Rob Hyndman's blogs, and the example below, could be used to select the order for an ARIMA model?

Rob J. Hyndman indicates in comments to his blog post "Time series cross-validation: an R example":

You would normally be trying several models and selecting the best based on a cross-validation procedure.

Cross-validation is both a method of measuring accuracy

anda method for choosing a model. The model you finally use for forecasting is the one that gives the best cross-validation accuracy.

Also, since cross validation is often used for model selection for cross sectional data*, it is quite natural to do something similar for time series data (where regular cross validation is replaced by rolling-window cross validation).

*From another post called "Why every statistician should know about cross-validation":

Minimizing a CV statistic is a useful way to do model selection such as choosing variables in a regression or choosing the degrees of freedom of a nonparametric smoother.

I'm wondering how you could use the rolling forecast technique to select the order of the ARIMA model. If anyone has a suggestion or example, that would be great.

First, you choose a set of candidate models. For each model in the set, you evaluate forecasting performance based on rolling-window cross validation. Then you choose the model that delivers the best forecasting performance.

Here is an example I ran at some point to compare model selection based on rolling-window cross validation with AIC-based selection. (I wanted to illustrate that model selection based on rolling-window cross validation is asymptotically equivalent to AIC-based choice.)

```
# Generate a T-long sample of an ARMA(1,1) process
T =10^4
#T =2*10^3 # uncomment for a shorter series (10^3 rolling windows instead of 9*10^3)
T0=1*10^3 # the length of the rolling window
set.seed(1); innov1=rnorm(T); set.seed(2); innov2=rnorm(T)
x1=arima.sim(model=list(ar1=0.5,ma1=0.2),n=T,innov=innov1,n.start=10^3,start.innov=innov2)
# Estimate three candidate models (ARMA(1,1), ARMA(2,1), ARMA(0,1)) on T0-long rolling windows,
# get 1-step-ahead mean squared forecast errors (MSFEs)
# (The loop below runs for about 15 minutes on a ThinkPad laptop with Sandy Bridge i5 processor produced in 2011.)
err1=err2=err3=rep(NA,T)
print(Sys.time()); for(end in T0:(T-1)){
if(end%%100==0){
print(paste("end =",end))
print(Sys.time())
}
range=c((end-T0+1):end)
model1 =arima(x1[range],order=c(1,0,1),seasonal=list(order=c(0,0,0),period=NA),
xreg=NULL,include.mean=TRUE,method="CSS-ML",optim.method="BFGS")
fcst1 =as.numeric(predict(object=model1,n.ahead=1)$pred)
err1[end]=fcst1-x1[end+1]
model2= arima(x1[range],order=c(2,0,1),seasonal=list(order=c(0,0,0),period=NA),
xreg=NULL,include.mean=TRUE,method="CSS-ML",optim.method="BFGS")
fcst2 =as.numeric(predict(object=model2,n.ahead=1)$pred)
err2[end]=fcst2-x1[end+1]
model3 =arima(x1[range],order=c(0,0,1),seasonal=list(order=c(0,0,0),period=NA),
xreg=NULL,include.mean=TRUE,method="CSS-ML",optim.method="BFGS")
fcst3 =as.numeric(predict(object=model3,n.ahead=1)$pred)
err3[end]=fcst3-x1[end+1]
}; print(Sys.time())
err1_orig=err1; err1=head(tail(err1,-T0),-1); msfe1=mean(err1^2); print(paste("MSFE1 =",msfe1))
err2_orig=err2; err2=head(tail(err2,-T0),-1); msfe2=mean(err2^2); print(paste("MSFE2 =",msfe2))
err3_orig=err3; err3=head(tail(err3,-T0),-1); msfe3=mean(err3^2); print(paste("MSFE3 =",msfe3))
# Estimate the three candidate models on the full sample, obtain their AICs
model1 =arima(x1[range],order=c(1,0,1),seasonal=list(order=c(0,0,0),period=NA),
xreg=NULL,include.mean=TRUE,method="CSS-ML",optim.method="BFGS")
AIC1=AIC(model1); print(paste("AIC1 =",AIC1))
model2 =arima(x1[range],order=c(2,0,1),seasonal=list(order=c(0,0,0),period=NA),
xreg=NULL,include.mean=TRUE,method="CSS-ML",optim.method="BFGS")
AIC2=AIC(model2); print(paste("AIC2 =",AIC2))
model3 =arima(x1[range],order=c(0,0,1),seasonal=list(order=c(0,0,0),period=NA),
xreg=NULL,include.mean=TRUE,method="CSS-ML",optim.method="BFGS")
AIC3=AIC(model3); print(paste("AIC3 =",AIC3))
# The ranking of the models by their 1-step-ahead mean squared forecast error
# should ideally match the ranking of the models by their AICs.
# Indeed, model1 gets the lowest AIC and the lowest MSFE; model2 follows; model3 is the last.
# Both rankings are consistent.
```

In general, building an ARMA-GARCH model in a stepwise fashion based on diagnostics such as ACF, PACF and Ljung-Box is problematic because the latter do not have standard null distributions when applied on returns when the conditional variance is nonconstant; or on squared returns when the conditional mean is nonconstant. Thus the following will not work exactly as you expect it to (but hopefully the distortion will not be too large and you could still trust the results to some extent):

I'm fitting a ARIMA-GARCH model to my hedge fund index daily log return series. I used ACF, PACF, Ljung-Box test and Archtest to check for autocorrelation and conditional heteroskedasticity. As the ACF and PACF for return itself don't show significant autocorrelation, (but they do for squared return), also suggested by the Ljung-Box test with h=0, so I exclude the autocorrelation in the mean process. But to double check, I use an ARIMA(1,0,1), and for both coefficients for AR and MA terms are not statistically significant. So I exclude them and go for only a GARCH model.

Going forward,

I did check the other similar questions about garch lag selection (for example, here) and it seems like that when it comes to the function of predicting, it's better to choose the one with lowest AIC rather than BIC. So I first compare the AIC then I further check using likelihood ratio test.

AIC, BIC and LR all address different questions and serve different goals. You should not expect all of them to point to the same direction, and you should choose the appropriate one based on your modelling goal. If the goal is forecasting, AIC is the most relevant choice.

Regarding your Q1, experience in finance tells us that high-order GARCH models do not tend to beat low-order GARCH models. I would stick to a relatively parsimonious model unless I had reasons to believe the time series is somehow special and unlike other financial time series. I do not see a sound *theoretical* reason to select a model that has higher AIC than another model (when there are not that many models being compared, like in your case), but the experience in finance points to a different solution.

Regarding your Q2, see above.

Regarding your Q3, it does matter that you consider the full model. Considering only part of the model does not make sense. (You could construct examples where you choose really poor models over much better models only because you happen to look at part of the picture instead of the whole picture.)

## Best Answer

I would also use

`auto.arima`

. The fact the the selected models change frequently between one window to the next may be due not only to frequent structural changes (which is probably unlikely) but to the fact that there are several models that approximate the patterns in the data about equally well, so their AICs are very close. Then changing two data points out of 1000 (dropping the oldest point and adding one new point) can make`auto.arima`

switch between these competing models. I would not worry too much about that, as each of these models likely implies very similar time series patterns. They are probably almost equivalent representations of the same thing. (Such a hypothesis could also be assessed by looking at the different models' impulse-response functions or the implied ACFs.)If the AICs of a few best models are very close, then it should also not matter much which one of these you choose. As far as we know, they are all almost equally good approximations of reality. So you could just pick the one you like and stick to it. That would make the results look much cleaner than having a constantly changing model. For that, consider obtaining not only the best model from

`auto.arima`

but, say, the top 5 together with their AICs. Do that in each window and see how different the AICs are. If they are very close, you could just pick one of these models and use it in all the windows.Or you could decide that you will only change the model from one window to the next if the difference in AICs between the model from the past window and the best one in the current window is sufficiently large. This should give you more stability from window to window and should not be difficult to program.