Solved – Variable selection and forecasting with regARIMA models

arimamodel selectiontime series

I have a couple of questions about regARIMA models:

  1. What is the underlying principle of the R function auto.arima when xreg is different from NULL? Does it first perform a regression of the time series on the explanatory variables and then select the best ARIMA model for the residuals of the previous regression? If I am not wrong, this would mean that the coefficients of the regression would be the same whether or not we say that the error terms are i.i.d normal variables with 0 mean and constant variance or follow an ARIMA process (in other words, whether or not you fit a simple regression model or a regARIMA model). Thus, fitting a regARIMA model would not have any impact on the coefficient of the explanatory variables…

  2. I read somewhere (maybe on this forum) that it was not relevant to look at the t-statistics or p-values to determine which variables have to be included in the model (in particular the dummy variables representing the potential seasonalities). I do not manage to find the link again. Could anyone remind me why?

  3. I did not manage to find any details about forecasting regARIMA models. I would say that the predicted value of your time series at (say) t+h is the sum of the fitted value given by the regression model you have fitted (at t+h) plus the predicted value of your error term at t+h (obtained via the usual forecasting methods for ARIMA models).

Thank you in advance for your precious help.

Best Answer

  1. The regression coefficients are optimized along with the ARMA coefficients. So you will not get the same answer if you fit the models separately.
  2. You probably read that selecting variables for prediction is not the same as testing their significance. In other words, if prediction is your goal, then doing tests will not necessarily give you the best results. Alternatives are the AIC, or some form of cross-validation or validation sample. By default, auto.arima() uses the AICc for the ARMA error (but does no variable selection on the regression variables -- it uses all you provide).
  3. Yes, the prediction is the sum of the regression models and the ARMA error. This is discussed, for example, in chapter 9 of my new book.
Related Question