Regression – What are the Stationarity Requirements for Using Regression with ARIMA Errors?

arimaregressionstationaritytime series

What are the stationarity requirements of using regression with ARIMA errors (dynamic regression) for inference?

Specifically, I have a non-stationary continuous outcome variable $y$, a non-stationary continuous predictor variable $x_a$ and a dummy variable treatment series $x_b$. I would like to know if the treatment was correlated with a change in the outcome variable that is more than two-standard errors away from zero change.

I am unsure if I need to difference these series before performing the regression with ARIMA errors modelling. In an answer to another question, IrishStat states that while the original series exhibit non-stationarity this does not necessarily imply that differencing is needed in a causal model. He then goes on to add that unwarranted usage [of differencing] can create statistical/econometric nonsense.

The SAS User Guide suggests that it is fine to fit regression models with ARIMA errors to non-stationary series without differencing so long as the residuals are non-stationary:

Note that the requirement of stationarity applies to the noise series.
If there are no input variables, the response series (after
differencing and minus the mean term) and the noise series are the
same. However, if there are inputs, the noise series is the residual
after the effect of the inputs is removed.

There is no requirement that the input series be stationary. If the
inputs are nonstationary, the response series will be nonstationary,
even though the noise process might be stationary.

When nonstationary input series are used, you can fit the input
variables first with no ARMA model for the errors and then consider
the stationarity of the residuals before identifying an ARMA model for
the noise part.

On the other hand, Rob Hyndman & George Athana­sopou­los assert:

An important consideration in estimating a regression with ARMA errors is that all variables in the model must first be stationary. So we first have to check that yt and all the predictors $(x_{1,t},\dots,x_{k,t})$ appear to be stationary. If we estimate the model while any of these are non-stationary, the estimated coefficients can be incorrect.

One exception to this is the case where non-stationary variables are co-integrated. If there exists a linear combination between the non-stationary $y_t$ and predictors that is stationary, then the estimated coefficients are correct.

Are these pieces of advice mutually exclusive? How is the applied analyst to proceed?

Best Answer

My reading of the SAS text, corresponds with Hyndman and Athansopoulos.

In short: Go with Hyndman and Athansopoulos.

The first two paragraphs of the SAS text seem to just be talking about regression without any ARMA.

The last paragraph of the SAS text seems to correspond to the last paragraph of Hyndman and Athansolpoulos.

Regarding the comment: "unwarranted usage [of differencing] can create statistical/econometric nonsense"

I am guessing that this is differencing when there is no unit root.

Regarding the comment: "while the original series exhibit non-stationarity this does not necessarily imply that differencing is needed in a causal model."

I think that this is in line with the second paragraph of Hyndman and Athansopoulos.

Note that so far, we have just discussed non-seasonal differencing. There also exists seasonal differencing. There are tests for this such as OCSB, HEGY and Kunst (1997). I recall that D. Osborne once wrote that it is better to seasonally difference when a time series is "on the cusp".

So in summary, this should be your approach:

  1. Are any of the variable co-integrated?
    • If yes, then those ones should not be differenced
  2. Make the non co-integrated variables stationary.
Related Question