Solved – Jack-knife with time series models

cross-validationforecastingrregressiontime series

Introduction

I am aiming to forecast the annual growth rates for a number of macroeconomic indicators (denote one by $Y_t$).
One of the tasks is to test the forecasting performance of rival time series models with and without exogenous variables ($X_t$, a $T\times k$ matrix). The list of rival models include:

  1. AR(I)MA model (annual growth rates are unlikely to have "unit Roo", though the latter is either assumed or tested) $$A(L)Y_t =\mu+ B(L)\varepsilon_t$$
  2. linear regression model with ARMA errors $$Y_t = X_t\beta + \eta_t, \ \ A(L)\eta_t = B(L)\varepsilon_t $$
  3. lagged dependent variable model (autoregressive model with exogenous variables)
    $$A(L)Y_t = X_t\beta + \varepsilon_t$$
  4. linear regression model
    $$Y_t = X_t\beta + \varepsilon_t$$

Where $\varepsilon_t$ is assumed to be a strong white noise, zero-mean constant variance i.i.d. process; $A(L)$ and $B(L)$ are autoregressive (of order $p$) and moving average (of order $q$) polynomials with $L$ – a back-shift (lag) operator.

Note that the primary and the only goal is forecasting performance, thus any "good" properties of parameter estimates are of the secondary concern. All I need is to test for the most parsimonious, robust to starting conditions forecaster. Decision will be made with one of the accuracy() options, but first I need to obtain the material for the comparison.

Models 1. and 2. are estimated by auto.arima() with default "CSS-ML" estimation method. Models 3. and 4. are estimated by ordinary least squares (lm()). $T$ is about $40$ quarters.

Approaches tried so far

To make the jack-knifed residuals the first approach denoted by "rolling" has been implemented. Starting from feasibly large sub-sample of time series data, parameters are estimated and an $h$ ahead forecast is done by the predict() function (EDIT: it is the same suggestion as in the first part Rob's answer to the second question). After that one point is added and the estimation\prediction steps are repeated.

A weak point of such experiments is that the number of time ticks (sample size) used to estimate the parameters is different. While I would like to test the robustness to the starting conditions, keeping the sample size for estimation fixed.

Bearing this in mind I tried to set the several subsequent values (EDIT: for the interval $k+p+q<t_0<t_1<T-h+1$) in $Y_t$ being missing values (NA). In models 2.-4. this also implies dropping the corresponding subsequent rows in data matrix $X_t$. Prediction for 3. and 4. is straightforward (the same predict() with omitted $X_t$ data rows works well). All my concerns are about models 1. and 2.

With just the AR($p$) part the predictions are done sequentially $Y_{t+1|t} = \hat A(L)Y_t$. But with the presence of MA($q$) one could not (?) use the estimated parameters directly. From the Brockwell and Davis "Introduction to Time Series and Forecasting" Chapter 3.3 it follows that one needs an innovation algorithm to recursively estimate the $\theta_{n,j}$ from the specific system of equations that involves estimated autoregressive and moving average parameters. EDIT: these $\theta_{n,j}$ parameters are used to make the ARMA prediction, not the originally estimated parameters $\theta_{j}$. However it is remarked in the same chapter that $\theta_{n,j}$ asymptotically approaches $\theta_{j}$ if the process is invertible. It is not evident that 30-40 points is enough for the asymptotic result to be used even if is invertible.

Notes: I don't want to restrict $q$ to zero, since I am not doing so in true out-of-sample forecasting. EDIT: also not that it is not missing value imputation problem, but forecasting experiment, that the trajectory is not supposed to bridge two sub-samples by this imputing the missing values.

Questions

  1. Does auto.arima() performs correctly with the presence of missing values inside of the sample? [Already answered by Rob.]
  2. (The actually crucial part of this post) How to correctly forecast (NOT impute) these missed points from the ARMA model when both $p>0$ and $q>0$? (I hope there are the ways already implemented in R language, but I simply is missing something.)

EDIT: since the parameters for ARMA parts are estimated correctly could I legally rearrange the arima object to include the estimated parameters and the data only for the first subsample and then use a predict function?

EDIT2: I have tried to modify the estimated mod structure – the resulting forecast from predict.Arima is identical (double precision differences) to the forecast where I use the estimated MA and AR coefficients predicting $Y_{t+1|t}$ directly as $\hat A(L)(Y_t-X_t\hat \beta)+ X_t\hat \beta+\hat B(L)\hat \varepsilon_t $, without KalmanForecast(). This was expected since state space representation is supplied with the same estimated $\theta_j$, not $\theta_{n,j}$. So the only question remains is the difference between $\theta_j$ and $\theta_{n,j}$ significant to influence the point forecasts? I hope the answer is negative.

Best Answer

I don't understand why you think $q>0$ is a problem for prediction. It is easy enough to forecast using an ARIMA model with MA terms and you don't need to use the innovations algorithm of Brockwell and Davis. That algorithm is useful for estimation; in particular, in getting starting values when optimizing the likelihood.

To answer your specific questions:

  1. auto.arima() calls arima() which uses a state space representation for computing the likelihood. Missing values are handled naturally in a state space format. So, yes, they are handled correctly.

  2. Missing historical values are not estimated by arima(). If you want to forecast them (i.e., using only past data), just fit a model up to the start of the missing sequence and then forecast from it. If you want to estimate them (using data before and afterwards), you would need to use a Kalman smoother based on the equivalent state space model. An alternative fudge that gives almost the same results is to average the forecasts using data up to the last non-missing data with the backcasts using data up to the first non-missing data after the missing sequence.

Related Question