Solved – ARIMA Cross Validation

arimaforecastingpredictiontime seriesvalidation

I work with R and have got some questions regarding my ARIMA model. In specific, I have yearly data ranging from 1946 to 2019 and would like to do a basic two-step ahead ARIMA forecast for 2020 and 2021. Plotting the time series and the ACF/PACF as well as performing a KPSS test reveals a trend and non-stationary data (see plots below). Taking the first differences and performing a KPSS test again does make the time series stationary. Thus I assume first differencing is required in the model.

enter image description here

Applying the auto.arima() function on the full sample (1946-2019) gives a first intuition on the functional form of time series. The auto.arima() suggests an ARIMA(1,1,0) with drift. However, this model (with the highest fit based on AICc) does not necessarily have to be a good model for forecasting as Hyndman et al. (2018) write. 1

Hence I do cross validation and split the data into a training (1946-2014) and a test set (2015-2019), perform auto.arima() on the training set and trace all the suggested models including the one with the lowest AICc, which also is an ARIMA(1,1,0) with drift. Next I perform forecasts with all these models (9 models in total) and compare their test set MAE/RMSE. The model with lowest MAE/RMSE is an ARIMA(1,1,1) with drift but it still gives a very inaccurate (but best?) forecast of the test set. This is, I assume, because it is not really trained with the strong upwards trend at the upper end of the time series. The orange line in the plot shows the real data whereas the blue line with a confidence interval shows the forecast. The residual plot looks fine to me and the Box-Ljung test with p = 0.31 does not indicate correlation.

enter image description here
enter image description here

Here is the thing I struggle with, I separated it into two questions:

1) I am not quite sure whether the ARIMA(1,1,1) with drift is a good model to apply on the whole time series (1946-2019) and perform the forecasts for 2020 and 2021 with it, since it still is so inaccurate in cross validation. Is it better to simply stick with the ARIMA(1,1,0) with drift for forecasting (as suggested by both the auto.arima() on the full data set (1946-2019) and the auto.arima() on the training data set, based on AICc)? Or do I need an entirely different approach?

2) Different ARIMA() specifications are suggested by auto.arima() depending on the size of the training set. As there are some changes in the trend behavior of the time series it is not really intuitive to me how many observations to include in the training set. If I make the training set smaller forecasts of the test set would be even more inaccurate since the trend of the time series changes at the upper end. If I make it larger there is less forecasted values to compare with. What would be good approach in my case?

Thank you for your help!

Best, Theo Ruben


1 Hyndman, R. B. and Athanasopoulos, G. (2018). Forecasting: Principles and Practice. Monash University, Australia

Best Answer

Your test set is very small. Model comparisons based on 4 observations are unlikely to be trustworthy. You could do time series cross validation based on rolling windows instead*. Even so I would lean towards AIC rather than time series cross validation, given the discussion in the thread "AIC versus cross validation in time series: the small sample case", though the case is not clear cut.

*See the last section of Rob J. Hyndman "Why every statistician should know about cross-validation".