Solved – “ARIMA” versus “ARMA on differenced data” gives different prediction interval

arimaforecastingprediction intervalr

I have seasonal time series (with frequency of 30). I am fitting ARIMA models using R library forecast.

My first ARIMA model would be (1,0,1)(1,1,0)[30] with fitted parameters:

Coefficients:
         ar1     ma1     sar1   drift
      0.6957  0.2992  -0.4496  0.8204
s.e.  0.0266  0.0398   0.0252  0.0597

sigma^2 estimated as 512.7:  log likelihood=-5943.27
AIC=11896.53   AICc=11896.58   BIC=11922.42

Then I try second model, which is same model only now I perform seasonal differencing manually before ARIMA. The second model which ran on seasonally differenced data (1,0,1)(1,0,0) is fitted:

Coefficients:
         ar1     ma1     sar1  intercept
      0.6959  0.2990  -0.4497    25.3671
s.e.  0.0266  0.0398   0.0252     1.8514

sigma^2 estimated as 500.8:  log likelihood=-5943.27
AIC=11896.54   AICc=11896.58   BIC=11922.54

Then I calculate some future forecasts from both models (the forecasts from second model are transformed back via "inverse differencing") and compare the forecasts.

The prediction interval for the first model is very narrow while the prediction interval for the second model is "wide open". Why does the prediction intervals differ so much when it is "same" model?

The primary goal of task I am working on is to deliver prediction interval. But I am confused which prediction interval is now correct ?

The R code to fit both models for time series Y1[30] and to forecast 10 seasons ahead:

m1=Arima(y = Y1,order = c(1,0,1),seasonal=c(1,1,0),include.drift = T)
f1=forecast(m1,level=c(.99),h=frequency(Y1)*10)
sY1=diff(Y1,frequency(Y1))
m2=Arima(y = sY1,order = c(1,0,1),seasonal=c(1,0,0),include.mean = T)
f2=forecast(m2,level=c(.99),h=frequency(Y1)*10)
plot(f1$lower[,1],t="l",ylim=c(1e3,5e3))
lines(f1$upper[,1])
lines(diffinv(f2$lower[,1],lag=frequency(Y1),xi=tail(Y1,frequency(Y1)))[-(1:frequency(Y1))],col="red")
lines(diffinv(f2$upper[,1],lag=frequency(Y1),xi=tail(Y1,frequency(Y1)))[-(1:frequency(Y1))],col="red")

To rephrase my question more generally:

Having two alternative models:

  1. Model M1 is an ARIMA(p,1,q) on time series Y
  2. Model M2 is an ARIMA(p,0,q) on time series Z=differenced(Y)

The AR and MA coefficients are equal for both models.

Then forecasting for future horizon H having two forecasted time series:

  1. F1=forecasts(Y)
  2. F2=back_differenced(forecasts(Z))

The mean forecasts is equal F1=F2 for the whole H horizon. But while the prediction interval of F1 are very narrow, the prediction interval of F2 is much wider.

Which prediction interval is correct ?

Best Answer

The prediction intervals from ARIMA(p,1,q) for the original data as produced by the function Arima will be correct, while those from ARIMA(p,0,q) for differenced data produced by manually undifferencing the forecasts the way you do that will be incorrect.


Illustration

Suppose the last observed value is $x_t=100$. Suppose the point forecasts for $t+1$, $t+2$ and $t+3$ from ARIMA(p,0,q) for differenced data are

\begin{aligned} \widehat{\Delta x}_{t+1}^{point} &= 0.0, \\ \widehat{\Delta x}_{t+2}^{point} &= 0.5, \\ \widehat{\Delta x}_{t+3}^{point} &= 0.0. \\ \end{aligned}

Suppose the lower end of the 80% prediction interval is

\begin{aligned} \widehat{\Delta x}_{t+1}^{0.1} &= -1.0, \\ \widehat{\Delta x}_{t+2}^{0.1} &= -0.5, \\ \widehat{\Delta x}_{t+3}^{0.1} &= -1.0; \\ \end{aligned}

and the upper end is

\begin{aligned} \widehat{\Delta x}_{t+1}^{0.9} &= 1.0, \\ \widehat{\Delta x}_{t+2}^{0.9} &= 1.5, \\ \widehat{\Delta x}_{t+3}^{0.9} &= 1.0. \\ \end{aligned}

(I assume symmetric prediction intervals here, but they could as well be asymmetric.)

To obtain forecasts for the original data (the data in levels), you need to undifference. Undifferencing is done by cummulatively summing the forecasts for the differenced data. That yields the point forecasts

\begin{aligned} \hat x_{t+1}^{point} &= x_t + \widehat{\Delta x}_{t+1}^{point} &= 100+0.0 &= 100.0, \\ \hat x_{t+2}^{point} &= x_t + \widehat{\Delta x}_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{point} &= 100+0.0+0.5 &= 100.5, \\ \hat x_{t+3}^{point} &= x_t + \widehat{\Delta x}_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{point} + \widehat{\Delta x}_{t+3}^{point} &= 100+0.0+0.5+0.0 &= 100.5. \\ \end{aligned}

Now what about the prediction intervals?


The correct way

The lower and upper forecasts are obtained in the same way as the point forecasts -- by summing up the forecasted differences:

\begin{aligned} \hat x_{t+1}^{0.1} &= x_t + \widehat{\Delta x}_{t+1}^{0.1} &= 100-1.0 &= 99.0, \\ \hat x_{t+2}^{0.1} &= x_t + \widehat{\Delta x}_{t+1}^{0.1} + \widehat{\Delta x}_{t+2}^{0.1} &= 100-1.0-0.5 &= 98.5, \\ \hat x_{t+3}^{0.1} &= x_t + \widehat{\Delta x}_{t+1}^{0.1} + \widehat{\Delta x}_{t+2}^{0.1} + \widehat{\Delta x}_{t+3}^{0.1} &= 100-1.0-0.5-1.0 &= 97.5; \\ \end{aligned}

and

\begin{aligned} \hat x_{t+1}^{0.9} &= x_t + \widehat{\Delta x}_{t+1}^{0.9} &= 100+1.0 &= 101.0, \\ \hat x_{t+2}^{0.9} &= x_t + \widehat{\Delta x}_{t+1}^{0.9} + \widehat{\Delta x}_{t+2}^{0.9} &= 100+1.0+1.5 &= 102.5, \\ \hat x_{t+3}^{0.9} &= x_t + \widehat{\Delta x}_{t+1}^{0.9} + \widehat{\Delta x}_{t+2}^{0.9} + \widehat{\Delta x}_{t+3}^{0.9} &= 100+1.0+1.5+1.0 &= 103.5. \\ \end{aligned}

As you see, the uncertainty has efectively cumulatively summed up this way: the uncertainty over $x_{t+3}$ ($\pm 3$) is greater than that for $x_{t+2}$ ($\pm 2$), which in turn is greater than that for $x_{t+1}$ ($\pm 1$). This is natural, as the further into the future, the less sure we can be.


The incorrect way

One may incorrectly try to obtain the lower and upper forecasts without cumulative summation but using only the last upper and lower values of $\widehat {\Delta x}_{t+h}$ around the point forecast $\hat x_{t+h}$ instead, which produces wrongly narrow prediction intervals:

\begin{aligned} \hat x_{t+1}^{0.1} &= x_t + \widehat{\Delta x}_{t+1}^{0.1} & &= 100-1.0 &= 99.0, \\ \hat x_{t+2}^{0.1} &= \hat x_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{0.1} &= x_t + \widehat{\Delta x}_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{0.1} &= 100.0-0.5 &= 99.5, \\ \hat x_{t+3}^{0.1} &= \hat x_{t+2}^{point} + \widehat{\Delta x}_{t+3}^{0.1} &= x_t + \widehat{\Delta x}_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{point} + \widehat{\Delta x}_{t+3}^{0.9} &= 100.5-1.0 &= 99.5; \\ \end{aligned}

and

\begin{aligned} \hat x_{t+1}^{0.9} &= x_t + \widehat{\Delta x}_{t+1}^{0.9} & &= 100+1.0 &= 101.0, \\ \hat x_{t+2}^{0.9} &= \hat x_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{0.9} &= x_t + \widehat{\Delta x}_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{0.1} &= 100.0+1.5 &= 101.5, \\ \hat x_{t+3}^{0.9} &= \hat x_{t+2}^{point} + \widehat{\Delta x}_{t+3}^{0.9} &= x_t + \widehat{\Delta x}_{t+1}^{point} + \widehat{\Delta x}_{t+2}^{point} + \widehat{\Delta x}_{t+3}^{0.9} &= 100.5+1.0 &= 101.5. \\ \end{aligned}

You can see explicitly that the wrong elements are summed here. Also, the outcome is counterintuitive: the prediction interval for $t+3$ is just as narrow as for $t+1$ or $t+2$. Just think about it: can we be equally certain over what will happen at time $t+3$ (the distant future) as at $t+2$ (medium distant future) and at $t+1$ (the near future)?

Related Question