I have seasonal time series (with frequency of 30). I am fitting ARIMA models using R library forecast
.
My first ARIMA model would be (1,0,1)(1,1,0)[30] with fitted parameters:
Coefficients:
ar1 ma1 sar1 drift
0.6957 0.2992 -0.4496 0.8204
s.e. 0.0266 0.0398 0.0252 0.0597
sigma^2 estimated as 512.7: log likelihood=-5943.27
AIC=11896.53 AICc=11896.58 BIC=11922.42
Then I try second model, which is same model only now I perform seasonal differencing manually before ARIMA. The second model which ran on seasonally differenced data (1,0,1)(1,0,0) is fitted:
Coefficients:
ar1 ma1 sar1 intercept
0.6959 0.2990 -0.4497 25.3671
s.e. 0.0266 0.0398 0.0252 1.8514
sigma^2 estimated as 500.8: log likelihood=-5943.27
AIC=11896.54 AICc=11896.58 BIC=11922.54
Then I calculate some future forecasts from both models (the forecasts from second model are transformed back via "inverse differencing") and compare the forecasts.
The prediction interval for the first model is very narrow while the prediction interval for the second model is "wide open". Why does the prediction intervals differ so much when it is "same" model?
The primary goal of task I am working on is to deliver prediction interval. But I am confused which prediction interval is now correct ?
The R code to fit both models for time series Y1[30] and to forecast 10 seasons ahead:
m1=Arima(y = Y1,order = c(1,0,1),seasonal=c(1,1,0),include.drift = T)
f1=forecast(m1,level=c(.99),h=frequency(Y1)*10)
sY1=diff(Y1,frequency(Y1))
m2=Arima(y = sY1,order = c(1,0,1),seasonal=c(1,0,0),include.mean = T)
f2=forecast(m2,level=c(.99),h=frequency(Y1)*10)
plot(f1$lower[,1],t="l",ylim=c(1e3,5e3))
lines(f1$upper[,1])
lines(diffinv(f2$lower[,1],lag=frequency(Y1),xi=tail(Y1,frequency(Y1)))[-(1:frequency(Y1))],col="red")
lines(diffinv(f2$upper[,1],lag=frequency(Y1),xi=tail(Y1,frequency(Y1)))[-(1:frequency(Y1))],col="red")
To rephrase my question more generally:
Having two alternative models:
- Model M1 is an ARIMA(p,1,q) on time series Y
- Model M2 is an ARIMA(p,0,q) on time series Z=differenced(Y)
The AR and MA coefficients are equal for both models.
Then forecasting for future horizon H having two forecasted time series:
- F1=forecasts(Y)
- F2=back_differenced(forecasts(Z))
The mean forecasts is equal F1=F2 for the whole H horizon. But while the prediction interval of F1 are very narrow, the prediction interval of F2 is much wider.
Which prediction interval is correct ?
Best Answer
The prediction intervals from ARIMA(p,1,q) for the original data as produced by the function
Arima
will be correct, while those from ARIMA(p,0,q) for differenced data produced by manually undifferencing the forecasts the way you do that will be incorrect.Illustration
Suppose the last observed value is $x_t=100$. Suppose the point forecasts for $t+1$, $t+2$ and $t+3$ from ARIMA(p,0,q) for differenced data are
\begin{aligned} \widehat{\Delta x}_{t+1}^{point} &= 0.0, \\ \widehat{\Delta x}_{t+2}^{point} &= 0.5, \\ \widehat{\Delta x}_{t+3}^{point} &= 0.0. \\ \end{aligned}
Suppose the lower end of the 80% prediction interval is
\begin{aligned} \widehat{\Delta x}_{t+1}^{0.1} &= -1.0, \\ \widehat{\Delta x}_{t+2}^{0.1} &= -0.5, \\ \widehat{\Delta x}_{t+3}^{0.1} &= -1.0; \\ \end{aligned}
and the upper end is
\begin{aligned} \widehat{\Delta x}_{t+1}^{0.9} &= 1.0, \\ \widehat{\Delta x}_{t+2}^{0.9} &= 1.5, \\ \widehat{\Delta x}_{t+3}^{0.9} &= 1.0. \\ \end{aligned}
(I assume symmetric prediction intervals here, but they could as well be asymmetric.)
To obtain forecasts for the original data (the data in levels), you need to undifference. Undifferencing is done by cummulatively summing the forecasts for the differenced data. That yields the point forecasts
\begin{aligned} \hat x_{t+1}^{point} &= x_t + \widehat{\Delta x}_{t+1}^{point} &= 100+0.0 &= 100.0, \\ \hat x_{t+2}^{point} &= x_t + \widehat{\Delta x}_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{point} &= 100+0.0+0.5 &= 100.5, \\ \hat x_{t+3}^{point} &= x_t + \widehat{\Delta x}_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{point} + \widehat{\Delta x}_{t+3}^{point} &= 100+0.0+0.5+0.0 &= 100.5. \\ \end{aligned}
Now what about the prediction intervals?
The correct way
The lower and upper forecasts are obtained in the same way as the point forecasts -- by summing up the forecasted differences:
\begin{aligned} \hat x_{t+1}^{0.1} &= x_t + \widehat{\Delta x}_{t+1}^{0.1} &= 100-1.0 &= 99.0, \\ \hat x_{t+2}^{0.1} &= x_t + \widehat{\Delta x}_{t+1}^{0.1} + \widehat{\Delta x}_{t+2}^{0.1} &= 100-1.0-0.5 &= 98.5, \\ \hat x_{t+3}^{0.1} &= x_t + \widehat{\Delta x}_{t+1}^{0.1} + \widehat{\Delta x}_{t+2}^{0.1} + \widehat{\Delta x}_{t+3}^{0.1} &= 100-1.0-0.5-1.0 &= 97.5; \\ \end{aligned}
and
\begin{aligned} \hat x_{t+1}^{0.9} &= x_t + \widehat{\Delta x}_{t+1}^{0.9} &= 100+1.0 &= 101.0, \\ \hat x_{t+2}^{0.9} &= x_t + \widehat{\Delta x}_{t+1}^{0.9} + \widehat{\Delta x}_{t+2}^{0.9} &= 100+1.0+1.5 &= 102.5, \\ \hat x_{t+3}^{0.9} &= x_t + \widehat{\Delta x}_{t+1}^{0.9} + \widehat{\Delta x}_{t+2}^{0.9} + \widehat{\Delta x}_{t+3}^{0.9} &= 100+1.0+1.5+1.0 &= 103.5. \\ \end{aligned}
As you see, the uncertainty has efectively cumulatively summed up this way: the uncertainty over $x_{t+3}$ ($\pm 3$) is greater than that for $x_{t+2}$ ($\pm 2$), which in turn is greater than that for $x_{t+1}$ ($\pm 1$). This is natural, as the further into the future, the less sure we can be.
The incorrect way
One may incorrectly try to obtain the lower and upper forecasts without cumulative summation but using only the last upper and lower values of $\widehat {\Delta x}_{t+h}$ around the point forecast $\hat x_{t+h}$ instead, which produces wrongly narrow prediction intervals:
\begin{aligned} \hat x_{t+1}^{0.1} &= x_t + \widehat{\Delta x}_{t+1}^{0.1} & &= 100-1.0 &= 99.0, \\ \hat x_{t+2}^{0.1} &= \hat x_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{0.1} &= x_t + \widehat{\Delta x}_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{0.1} &= 100.0-0.5 &= 99.5, \\ \hat x_{t+3}^{0.1} &= \hat x_{t+2}^{point} + \widehat{\Delta x}_{t+3}^{0.1} &= x_t + \widehat{\Delta x}_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{point} + \widehat{\Delta x}_{t+3}^{0.9} &= 100.5-1.0 &= 99.5; \\ \end{aligned}
and
\begin{aligned} \hat x_{t+1}^{0.9} &= x_t + \widehat{\Delta x}_{t+1}^{0.9} & &= 100+1.0 &= 101.0, \\ \hat x_{t+2}^{0.9} &= \hat x_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{0.9} &= x_t + \widehat{\Delta x}_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{0.1} &= 100.0+1.5 &= 101.5, \\ \hat x_{t+3}^{0.9} &= \hat x_{t+2}^{point} + \widehat{\Delta x}_{t+3}^{0.9} &= x_t + \widehat{\Delta x}_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{point} + \widehat{\Delta x}_{t+3}^{0.9} &= 100.5+1.0 &= 101.5. \\ \end{aligned}
You can see explicitly that the wrong elements are summed here. Also, the outcome is counterintuitive: the prediction interval for $t+3$ is just as narrow as for $t+1$ or $t+2$. Just think about it: can we be equally certain over what will happen at time $t+3$ (the distant future) as at $t+2$ (medium distant future) and at $t+1$ (the near future)?