The usual approach to the point forecasts is to cumulatively add the difference-forecasts to the last cumulative observation. If $z$ are the differenced data and $y$ the original, then:
$\hat{y}_{t+1}=y_t+\hat{z}_{t+1}$
$\hat{y}_{t+2}=\hat{y}_{t+1}+\hat{z}_{t+2}=y_t+\hat{z}_{t+1}+\hat{z}_{t+2}$
and so on. The cumulative sums of the $\hat{z}$ values are easy, and adding $y_t$ is also easy. Details of what to do in R would depend on the specific model (e.g. if you fitted an ARMA to the differences, just specify the corresponding ARIMA to the original and predict
that).
On your second question, you have more data, so you have to calculate new parameter estimates anyway. Since you have new parameter estimates you'd just recalculate forecasts the same way as before, but with those 7 additional observations. In some circumstances the parameter updates can be computed in terms of the old estimates and the data (e.g. state-space models), but for many models you just recompute everything like you did with the previous forecasts.
No, we don't re-train the model. Here is what the help page ?Arima
say for the model
parameter:
If model is passed,
this same model is fitted to ‘x’ without re-estimating any
parameters.
Here is an example:
# library(forecast)
# model.1 <- auto.arima(AirPassengers[1:24])
# model.1
Series: AirPassengers[1:24]
ARIMA(1,0,1) with non-zero mean
Coefficients:
ar1 ma1 intercept
0.4137 0.6353 133.3991
s.e. 0.2091 0.1479 6.2032
sigma^2 estimated as 129.4: log likelihood=-93
AIC=193.99 AICc=196.1 BIC=198.7
# model.2 <- Arima(AirPassengers[1:48],model=model.1)
# model.2
Series: AirPassengers[1:48]
ARIMA(1,0,1) with non-zero mean
Coefficients:
ar1 ma1 intercept
0.4137 0.6353 133.3991
s.e. 0.0000 0.0000 0.0000
sigma^2 estimated as 385.6: log likelihood=-211.61
AIC=425.22 AICc=425.31 BIC=427.09
We note:
- The estimated coefficients are the same. (No surprise, since they are not re-estimated.)
- The standard errors are all zero. (I'd assume they are manually set this way, since they don't make any sense and would not be connected to the new data.)
- The estimated residual variance $\sigma^2$ has changed. This makes sense, since prediction intervals for non-re-restimated parameters will be larger than for re-estimated parameters, since the parameters don't fit as well as re-estimated ones would have.
- The log-likelihood and information criteria change, since they are all related to $\sigma^2$.
Now, if we forecast, we of course get different values, since in each case the last observations we autoregress on are different:
# forecast(model.1,h=6)$mean
Time Series:
Start = 25
End = 30
Frequency = 1
[1] 150.1410 140.3254 136.2646 134.5846 133.8896 133.6020
# forecast(model.2,h=6)$mean
Time Series:
Start = 49
End = 54
Frequency = 1
[1] 187.6868 155.8583 142.6906 137.2431 134.9894 134.0570
As to why we would not re-estimate the model after getting new data... I also don't see a really good reason. Perhaps in specific situations you might have performance issues. You may assume that a few more data points won't change the parameters a lot, especially if you already have a long time series with thousands of observations - in which case re-estimating would take some time, too.
Best Answer