Solved – Rolling analysis with out-of sample

out-of-samplertime series

I have a model that looks like

lm(y ~ lag(x, -1) + lag(z, -1))

So basically, this is a time series regression with exogenous variables, and I want to carry out a rolling analysis of sample forecasts, meaning that:
I first used a subsample (e.g., 1990-1995) for estimation, then I performed a one step ahead forecast, then I added one observation and made another one step ahead forecast, and so on.

I have tried to work with rollapply, defining the model as arima(0,0,0) with xreg=lags of the other variables, but that doesn't work.

Your help would be much appreciated!

Best Answer

Here's a brute-force method, which in general I prefer if a) I can't find an appropriate R function in about 3 minutes, and b) I can see that the brute force function's going to be easy to write.

First, I would start by realigning the variables in a data frame so you don't need to use the lag function:

N <- nrow(y)
df <- as.data.frame(cbind(y[2:N],x[1:(N-1)],z[1:(N-1)]))
colnames(df) <- c("y","x","z")

Then, define a prediction function:

# Start with M observations, gather 1-step-ahead predictions
predict.1 <- function(f, df, M)
{
  P <- nrow(df) - M
  results <- rep(0, P)

  for (i in 1:P) {
    df.pred <- df[M+i,]
    df.est <- df[1:(M+i-1),]
    results[i] <- predict(lm(f, data=df.est), newdata=df.pred)
  }
  results
}

Of course, you could make it more terse, but I'm trying to make it (a little) clearer than it could be. Using this function looks like:

> # Create sample data
> # Pretend we've "realigned" lagged variables so we don't need to refer to them as lagged.
> x <- rnorm(50)
> z <- rnorm(50)
> y <- x + z + rnorm(50)
> df <- as.data.frame(cbind(y, x, z))
> colnames(df) <- c("y","x","z")
> 
> pred.vals <- predict.1(y~x+z, df, 40)
> pred.vals
 [1] -0.33967757  2.30165856 -0.40084611  0.31978776 -1.75524544
 [6] -0.21552467  0.09107069  0.53836453  0.19864094  2.09003861
> 

It should be pretty obvious how to change the function to accept a parameter for the forecast horizon. If you want your one-step-ahead forecasts to always use the same number of data points in the history (instead of growing the number of data points in the history by one each step) that's a pretty simple change too. I pass the formula for the regression function in "f" so that, if I'm comparing different models, I don't have to change the interior of predict.1 each time.

Related Question