I'm wondering if a rolling forecast technique like the ones mentioned in Rob Hyndman's blogs, and the example below, could be used to select the order for an ARIMA model?
In the examples I've looked at, like the ones below, it seems like the order of the ARIMA model is already specified, or is determined once by auto.arima and then the single model is evaluated using the forloop in the rolling forecast.
I'm wondering how you could use the rolling forecast technique to select the order of the ARIMA model. If anyone has a suggestion or example, that would be great.
Examples:
http://robjhyndman.com/hyndsight/tscvexample/
http://robjhyndman.com/hyndsight/rolling-forecasts/
Code:
library("fpp")
h <- 5
train <- window(hsales,end=1989.99)
test <- window(hsales,start=1990)
n <- length(test) - h + 1
fit <- auto.arima(train)
fc <- ts(numeric(n), start=1990+(h-1)/12, freq=12)
for(i in 1:n)
{
x <- window(hsales, end=1989.99 + (i-1)/12)
refit <- Arima(x, model=fit)
fc[i] <- forecast(refit, h=h)$mean[h]
}
Update:
Pseudo code:
library("fpp")
h <- 5
train <- window(hsales,end=1989.99)
test <- window(hsales,start=1990)
n <- length(test) - h + 1
##Create models for all combinations of p 10 to 0, d 2 to 0, q 10 to 0
fit1 <- Arima(train, order=c(10,2,10)
fit2 <- Arima(train, order=c(9,2,10)
fit3 <- Arima(train, order=c(8,2,10)
.
.
.
fit10 <- Arima(train, order=c(0,2,10)
fc1 <- ts(numeric(n), start=1990+(h-1)/12, freq=12)
fc2 <- ts(numeric(n), start=1990+(h-1)/12, freq=12)
fc3 <- ts(numeric(n), start=1990+(h-1)/12, freq=12)
.
.
.
fc10 <- ts(numeric(n), start=1990+(h-1)/12, freq=12)
for(i in 1:n)
{
x <- window(hsales, end=1989.99 + (i-1)/12)
refit1 <- Arima(x, model=fit1)
refit2 <- Arima(x, model=fit2)
refit3 <- Arima(x, model=fit3)
.
.
.
refit10 <- Arima(x, model=fit10)
fc1[i] <- forecast(refit1, h=h)$mean[h]
fc2[i] <- forecast(refit2, h=h)$mean[h]
fc3[i] <- forecast(refit3, h=h)$mean[h]
.
.
.
fc10[i] <- forecast(refit10, h=h)$mean[h]
}
##Calculating mape for forecasts
Accuracy(fc1$mean,test)[,5]
Accuracy(fc2$mean,test)[,5]
Accuracy(fc3$mean,test)[,5]
.
.
.
Accuracy(fc10$mean,test)[,5]
##Return the order of the Arima model that has the lowest mape
Best Answer
Rob J. Hyndman indicates in comments to his blog post "Time series cross-validation: an R example":
Also, since cross validation is often used for model selection for cross sectional data*, it is quite natural to do something similar for time series data (where regular cross validation is replaced by rolling-window cross validation).
*From another post called "Why every statistician should know about cross-validation":
First, you choose a set of candidate models. For each model in the set, you evaluate forecasting performance based on rolling-window cross validation. Then you choose the model that delivers the best forecasting performance.
Here is an example I ran at some point to compare model selection based on rolling-window cross validation with AIC-based selection. (I wanted to illustrate that model selection based on rolling-window cross validation is asymptotically equivalent to AIC-based choice.)