I have a weekly times series for which I would like to find the best fit model. So far I've tried arima, Harmonic regression with arima error, neural network and in the end I would like to decide which one has been better fitted to my raw data. The time series look like this, with heavy seasonal and cyclic pattern:
I also put the Ljung-Box
test and the plot of predicted values here for each:
# Arima
fit <- training %>% auto.arima(lambda = 0)
fit %>%
checkresiduals()
Ljung-Box test
data: Residuals from ARIMA(3,1,0)
Q* = 23.619, df = 23, p-value = 0.4252
Model df: 3. Total lags used: 26
#Harmonic regression with arima error
fit2 <- auto.arima(training, lambda = 0, seasonal = TRUE, xreg = fourier(training, K = 4))
fit2 %>%
checkresiduals()
Ljung-Box test
data: Residuals from Regression with ARIMA(2,1,1) errors
Q* = 21.642, df = 15, p-value = 0.1175
Model df: 11. Total lags used: 26
#Neural Network
fit3 <- nnetar(training, lambda = 0)
fit3
They all seem fine based on Ljung-Box
test but they somehow failed to capture the wiggly form of the time-series here which I don't know how important it is. But my main question is when I check the accuracy if I choose RMSE I have to pick the harmonic regression one and if I choose MAPE I have to pick neural network model. And I would also like to know why RMSE and MAPE values are so different here.
# Arima
accuracy(forecast(fit, h = 16), test)
ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
Training set 1.948693 27.56683 19.09467 -4.402578 25.87164 0.5790763 0.21495069 NA
Test set 43.293579 61.02374 46.31065 32.745528 39.26652 1.4044442 0.09636865 1.158678
# Harmonic Regression
accuracy(forecast(h = 16, fit2, xreg = fourier(training, K = 4)), test)
ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
Training set 4.323546 24.4800 16.05035 -1.464388 21.89874 0.4867525 0.1751586 NA
Test set -2.495049 42.1323 33.03114 -171.095704 194.16485 1.0017220 0.2349017 4.288442
# Neural Network
accuracy(forecast(fit3, h = 16), test)
ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
Training set 3.414448 22.63083 14.31504 -2.615375 16.93870 0.4341265 0.2253450 NA
Test set 40.095160 58.90628 44.16645 28.908539 37.72779 1.3394181 0.1107563 1.119875
Thank you very much for your help, I really appreciate it in advance.
Best Answer
Use the RMSE.
Note that the (R)MSE and the MAPE will be minimized by quite different point forecasts (see my answer at Higher RMSE lower MAPE). You should first decide which functional of the unknown future distribution you want to elicit, then choose the corresponding error measure.
However, note that an ARIMA model will output a conditional expectation forecast, i.e., the functional that optimizes (R)MSE. It makes little sense to train a model to minimize the (R)MSE, then to assess its forecasts with a different error measure (Kolassa, 2020, IJF). If you truly want to find a MAPE-optimal forecast, you should also use the MAPE to fit your model. I am not aware of any off-the-shelf forecasting software that does this (if you use an ML pipeline, you may be able to specify any fitting criterion and choose the MAPE), and I have major doubts as to the usefulness of a MAPE-minimal forecast.