ARIMA Model Selection – How to Use Auto.arima to Choose Between Many Regressors

arimaforecastingregression

I have to forecast data with two seasonality with ARIMA.
I find that I have to use a code like this:

myForecaster <- function (parameters, training_df_set, testing_df_set, fourier_order = 3) {
  require(forecast)
  y          <- msts(training_df_set$bikes_mean, seasonal.periods = c(parameters$seasonal_period_day, parameters$seasonal_period_week))
  seas       <- fourier(y, K=c(fourier_order,fourier_order))
  fit        <- auto.arima(y, xreg=cbind(seas, training_df_set$regr), seasonal=FALSE)

  prediction_horizon_lenght <- length(testing_df_set$bikes_mean)
  seas.f     <- fourierf(y, K=c(fourier_order,fourier_order), h=prediction_horizon_lenght)
  forecaster <- forecast(fit, xreg=cbind(seas.f, testing_df_set$regr), h=prediction_horizon_lenght)

  lista        <- list(fit, forecaster, forecaster[['mean']]-testing_df_set$bikes_mean)
  names(lista) <- c("fit", "forecaster", "h_error")
  return (lista)
}

The questions are:

1) If I have a lot of regressors how can I choose between them the best subset for the ARIMA regressor?

2) Should the predictor series be already stationary when passed to the auto.arima or auto.arima automatically tranform the predictors to stationary to perform the parameter estimation?

Best Answer

Here are some ideas to choose among several regressors to be included in an ARIMA model. The ideas below are based on function remove.outliers in package tsoutliers, where a selection is done among a potential set of outliers detected in a previous stage.

  • Approach 1 (labelled "en-masse" in remove.outliers):

    1. Fit the ARIMA model including all the exogenous regressors and obtain the t-statistic for the significance of each regressor.
    2. Remove those regressors for which the t-statistic is lower than a threshold chosen beforehand.
    3. Fit again the ARIMA model with the selected regressors and check again the t-statistics removing those regressors that are not significant. Repeat this process until all the regressors that were not discarded so far (if any) are significant.
  • Approach 2 (labelled "bottom-up" in remove.outliers):

    1. Fit the ARIMA model including all the exogenous regressors and get the corresponding t-statistics. Sort the regressors by the magnitude of the t-statistic (from larger to smaller in absolute value).
    2. Fit the ARIMA model including only the regressor with the highest t-statistic in absolute value. If it is significant, then choose it for your final set of regressors, otherwise discard it.
    3. Repeat the previous step by fitting the ARIMA model including only the regressors that have been selected so far plus the next one (the regressor with the next largest t-statistic). Repeat until all the regressors were added and tested.

You may choose to run this iterative processes for an ARIMA model chosen beforehand or to run the model selection procedure in auto.arima each time a model is fit. Alternatively, you can run auto.arima after choosing a set of regressors and repeat any of the processes above if a new ARIMA model is proposed.

As regards your second question, you don't need to transform the regressors before passing them to arima, unless you have differenced your dependent variable, in which case you should apply the same filter to the regressors. You may let auto.arima to choose the differencing order so that you can just pass the original regressors.

Related Question