Solved – Difference time series and then minus the mean of the differenced series within Arima

arimartime series

This question is similar to the following question in the sense I am currently doing the differencing and mean removal of the time series outside the Arima function in R. And I do not know how to do these steps within Arima function in R. The reason is that I am trying to perform the following procedure (data dowj_ts can be found at the bottom):

dowj_ts_d1 <- diff(dowj_ts) # differencing at lag 1 (1-B)
drift <- mean(diff(dowj_ts))
dowj_ts_d1_demeaned <- dowj_ts_d1 - mean(dowj_ts_d1) # mean removal
# Maximum Likelihood AR(1) for the mean-corrected differences X_t
fit <- Arima(dowj_ts_d1_demeaned, order=c(1,0,0),include.mean=F, transform.pars = T)

Note that the drift is actually 0.1336364. And summary(fit) gives the table below:

Series: dowj_ts_d1_demeaned 
ARIMA(1,0,0) with zero mean     

Coefficients:
         ar1
      0.4471
s.e.  0.1051

sigma^2 estimated as 0.1455:  log likelihood=-35.16
AIC=74.32   AICc=74.48   BIC=79.01

Training set error measures:
                       ME     RMSE       MAE       MPE     MAPE      MASE
Training set -0.004721362 0.381457 0.2982851 -9.337089 209.6878 0.8477813
                    ACF1
Training set -0.04852626

Ultimately, I want to predict 2-step ahead forecast of the original series, and this starts to become ugly:

 tail(c(dowj_ts[1], dowj_ts[1] + cumsum(c(dowj_ts_d1_demeaned,forecast.Arima(fit,h=2)$mean) + drift)),2)

And currently these are all done outside the Arima function from the forecast package. I know I can do differencing within Arima like this:

 Arima(dowj_ts, order=c(1,1,0),include.drift=T,transform.pars = F)

This gives:

Series: dowj_ts 
ARIMA(1,1,0) with drift         

Coefficients:
         ar1   drift
      0.4478  0.1204
s.e.  0.1059  0.0786

sigma^2 estimated as 0.1474:  log likelihood=-34.69
AIC=75.38   AICc=75.71   BIC=82.41

But the drift term computed by R is different from the drift = 0.1336364 that I computed manually.

So my question is: how can I differenced the series and then remove the mean of the differenced series within the Arima function ?

Second question: Why is the drift term estimated by Arima different from the drift term I computed ? In fact, what does the mathematical model look like when include.drift = T ? This really confuses me.

Data can be found below:

structure(c(110.94, 110.69, 110.43, 110.56, 110.75, 110.84, 110.46, 
110.56, 110.46, 110.05, 109.6, 109.31, 109.31, 109.25, 109.02, 
108.54, 108.77, 109.02, 109.44, 109.38, 109.53, 109.89, 110.56, 
110.56, 110.72, 111.23, 111.48, 111.58, 111.9, 112.19, 112.06, 
111.96, 111.68, 111.36, 111.42, 112, 112.22, 112.7, 113.15, 114.36, 
114.65, 115.06, 115.86, 116.4, 116.44, 116.88, 118.07, 118.51, 
119.28, 119.79, 119.7, 119.28, 119.66, 120.14, 120.97, 121.13, 
121.55, 121.96, 122.26, 123.79, 124.11, 124.14, 123.37, 123.02, 
122.86, 123.02, 123.11, 123.05, 123.05, 122.83, 123.18, 122.67, 
122.73, 122.86, 122.67, 122.09, 122, 121.23), .Tsp = c(1, 78, 
1), class = "ts")

Best Answer

The code

Arima(dowj_ts, order=c(1,1,0),include.drift=T,transform.pars = F)

is fine. You should be able to call forecast on this without a problem.

The reason your drift estimate is different is because Arima uses the method of maximum likelihood. Your sample mean is not the maximum likelihood estimate of this parameter. There is no closed form expression for the MLE estimates of the parameters. They have to be found using an iterative algorithm.

Related Solutions

ARIMA – Understanding Relationship Between Two Time Series

Matt, You are very right in the concerns that you have raised with respect to using unnecessary differencing structure . In order to identify an appropriate model enter image description here for your data yielding significant structure while rendering a Gaussian Error process with an ACF of the Transfer Function Identification modelling process requires ( in this case ) suitable differencing to create surrogate series that are stationary and thus usable to IDENTIFY the relationshop. In this the differencing requirements for IDENTIFICATION were double differencing for the X and single differencing for the Y. Additionally an ARIMA filter for the doubly differenced X was found to be an AR(1). Applying this ARIMA filter ( for identification purposes only ! ) to both stationary series yielded the following cross-correlative structure . enter image description here suggesting a simple contemporaneous relationship. . Note that while the original series exhibit non-stationarity this does not necessarily imply that differencing is needed in a causal model. The final model and final acf support this . In closing the final equation aside from the one empirically identified level shifts ( really intercept changes ) is

 Y(t)=-4.78 + .192*X(t) - .177*X(t-1) which is NEARLY equal to 

 Y(t)=-4.78 + .192*[X(t)-X(t-1)] which means that changes in X effect the level of Y

Finally note the characteristics of the suggested model. enter image description here

the Level Shift series (0,0,0,0,0,0,0,0,0,1,1,.........,1) suggests if left untreated the model residuals would exhibit a level shift at or around time period 10 THUS a test of the hypothesis of a common residual mean between the first 10 residuals and the last 42 would be significant at alpha=.0002 based upon a "t test of -4.10" . Note that the inclusion of a constant guarantees that the overall mean of the residuals does not differ significantly from zero BUT this is not necessarily for all subset time intervals. The following graph clearly shows this ( given that you were told to look ! ).The Actual/Fit/Forecast is quite illuminating enter image description here . Statistics are like lampposts, some use them to lean on others use them for illumination.

Solved – Difference time series before Arima or within Arima

There are several issues here.

If you difference first, then Arima() will fit a model to the differenced data. If you let Arima() do the differencing as part of the estimation procedure, it will use a diffuse prior for the initialization. This is explained in the help file for arima(). So the results will be different due to the different ways the initial observation is handled. I don't think it makes much difference in terms of the quality of the estimation. However, it is much easier to let Arima() handle the differencing if you want forecasts or fitted values on the original (undifferenced) data.
Apart from differences in estimation, your two models are not equivalent because modB includes a constant while modA does not. By default, Arima() includes a constant when $d=0$ and no constant when $d>0$. You can over-ride these defaults with the include.mean argument.
Fitted values for the original data are not equivalent to the undifferenced fitted values on the differenced data. To see this, note that the fitted values on the original data are given by $$\hat{X}_t = X_{t-1} + \phi(X_{t-1}-X_{t-2})$$ whereas the fitted values on the differenced data are given by $$\hat{Y}_t = \phi (X_{t-1}-X_{t-2})$$ where $\{X_t\}$ is the original time series and $\{Y_t\}$ is the differenced series. Thus $$\hat{X}_t - \hat{X}_{t-1} \ne \hat{Y}_t.$$

Best Answer

Related Solutions

ARIMA – Understanding Relationship Between Two Time Series

Solved – Difference time series before Arima or within Arima

Related Question