Solved – Difference time series and then minus the mean of the differenced series within Arima

arimartime series

This question is similar to the following question in the sense I am currently doing the differencing and mean removal of the time series outside the Arima function in R. And I do not know how to do these steps within Arima function in R. The reason is that I am trying to perform the following procedure (data dowj_ts can be found at the bottom):

dowj_ts_d1 <- diff(dowj_ts) # differencing at lag 1 (1-B)
drift <- mean(diff(dowj_ts))
dowj_ts_d1_demeaned <- dowj_ts_d1 - mean(dowj_ts_d1) # mean removal
# Maximum Likelihood AR(1) for the mean-corrected differences X_t
fit <- Arima(dowj_ts_d1_demeaned, order=c(1,0,0),include.mean=F, transform.pars = T)

Note that the drift is actually 0.1336364. And summary(fit) gives the table below:

Series: dowj_ts_d1_demeaned 
ARIMA(1,0,0) with zero mean     

Coefficients:
         ar1
      0.4471
s.e.  0.1051

sigma^2 estimated as 0.1455:  log likelihood=-35.16
AIC=74.32   AICc=74.48   BIC=79.01

Training set error measures:
                       ME     RMSE       MAE       MPE     MAPE      MASE
Training set -0.004721362 0.381457 0.2982851 -9.337089 209.6878 0.8477813
                    ACF1
Training set -0.04852626

Ultimately, I want to predict 2-step ahead forecast of the original series, and this starts to become ugly:

 tail(c(dowj_ts[1], dowj_ts[1] + cumsum(c(dowj_ts_d1_demeaned,forecast.Arima(fit,h=2)$mean) + drift)),2)

And currently these are all done outside the Arima function from the forecast package. I know I can do differencing within Arima like this:

 Arima(dowj_ts, order=c(1,1,0),include.drift=T,transform.pars = F)

This gives:

Series: dowj_ts 
ARIMA(1,1,0) with drift         

Coefficients:
         ar1   drift
      0.4478  0.1204
s.e.  0.1059  0.0786

sigma^2 estimated as 0.1474:  log likelihood=-34.69
AIC=75.38   AICc=75.71   BIC=82.41

But the drift term computed by R is different from the drift = 0.1336364 that I computed manually.

So my question is: how can I differenced the series and then remove the mean of the differenced series within the Arima function ?

Second question: Why is the drift term estimated by Arima different from the drift term I computed ? In fact, what does the mathematical model look like when include.drift = T ? This really confuses me.

Data can be found below:

structure(c(110.94, 110.69, 110.43, 110.56, 110.75, 110.84, 110.46, 
110.56, 110.46, 110.05, 109.6, 109.31, 109.31, 109.25, 109.02, 
108.54, 108.77, 109.02, 109.44, 109.38, 109.53, 109.89, 110.56, 
110.56, 110.72, 111.23, 111.48, 111.58, 111.9, 112.19, 112.06, 
111.96, 111.68, 111.36, 111.42, 112, 112.22, 112.7, 113.15, 114.36, 
114.65, 115.06, 115.86, 116.4, 116.44, 116.88, 118.07, 118.51, 
119.28, 119.79, 119.7, 119.28, 119.66, 120.14, 120.97, 121.13, 
121.55, 121.96, 122.26, 123.79, 124.11, 124.14, 123.37, 123.02, 
122.86, 123.02, 123.11, 123.05, 123.05, 122.83, 123.18, 122.67, 
122.73, 122.86, 122.67, 122.09, 122, 121.23), .Tsp = c(1, 78, 
1), class = "ts")

Best Answer

The code

Arima(dowj_ts, order=c(1,1,0),include.drift=T,transform.pars = F)

is fine. You should be able to call forecast on this without a problem.

The reason your drift estimate is different is because Arima uses the method of maximum likelihood. Your sample mean is not the maximum likelihood estimate of this parameter. There is no closed form expression for the MLE estimates of the parameters. They have to be found using an iterative algorithm.