Regression – What are the Stationarity Requirements for Using Regression with ARIMA Errors?

arimaregressionstationaritytime series

What are the stationarity requirements of using regression with ARIMA errors (dynamic regression) for inference?

Specifically, I have a non-stationary continuous outcome variable $y$, a non-stationary continuous predictor variable $x_a$ and a dummy variable treatment series $x_b$. I would like to know if the treatment was correlated with a change in the outcome variable that is more than two-standard errors away from zero change.

I am unsure if I need to difference these series before performing the regression with ARIMA errors modelling. In an answer to another question, IrishStat states that while the original series exhibit non-stationarity this does not necessarily imply that differencing is needed in a causal model. He then goes on to add that unwarranted usage [of differencing] can create statistical/econometric nonsense.

The SAS User Guide suggests that it is fine to fit regression models with ARIMA errors to non-stationary series without differencing so long as the residuals are non-stationary:

Note that the requirement of stationarity applies to the noise series.
If there are no input variables, the response series (after
differencing and minus the mean term) and the noise series are the
same. However, if there are inputs, the noise series is the residual
after the effect of the inputs is removed.

There is no requirement that the input series be stationary. If the
inputs are nonstationary, the response series will be nonstationary,
even though the noise process might be stationary.

When nonstationary input series are used, you can fit the input
variables first with no ARMA model for the errors and then consider
the stationarity of the residuals before identifying an ARMA model for
the noise part.

On the other hand, Rob Hyndman & George Athanasopoulos assert:

An important consideration in estimating a regression with ARMA errors is that all variables in the model must first be stationary. So we first have to check that yt and all the predictors $(x_{1,t},\dots,x_{k,t})$ appear to be stationary. If we estimate the model while any of these are non-stationary, the estimated coefficients can be incorrect.

One exception to this is the case where non-stationary variables are co-integrated. If there exists a linear combination between the non-stationary $y_t$ and predictors that is stationary, then the estimated coefficients are correct.

Are these pieces of advice mutually exclusive? How is the applied analyst to proceed?

Best Answer

My reading of the SAS text, corresponds with Hyndman and Athansopoulos.

In short: Go with Hyndman and Athansopoulos.

The first two paragraphs of the SAS text seem to just be talking about regression without any ARMA.

The last paragraph of the SAS text seems to correspond to the last paragraph of Hyndman and Athansolpoulos.

Regarding the comment: "unwarranted usage [of differencing] can create statistical/econometric nonsense"

I am guessing that this is differencing when there is no unit root.

Regarding the comment: "while the original series exhibit non-stationarity this does not necessarily imply that differencing is needed in a causal model."

I think that this is in line with the second paragraph of Hyndman and Athansopoulos.

Note that so far, we have just discussed non-seasonal differencing. There also exists seasonal differencing. There are tests for this such as OCSB, HEGY and Kunst (1997). I recall that D. Osborne once wrote that it is better to seasonally difference when a time series is "on the cusp".

So in summary, this should be your approach:

Are any of the variable co-integrated?
- If yes, then those ones should not be differenced
Make the non co-integrated variables stationary.

Holt Winters Example

Holt Winters filter was mentioned in the comments. It's a popular choice for smoothing and forecasting certain kinds of seasonal series, and it can deal with nonstationary series. Particularly, it can handle series where the mean level grows linearly with time. In other words where the slope is stable. In my terminology the slope is one of the invariants that this approach extracts from the series. Let's see how it fails when the slope is unstable.

In this plot I'm showing the deterministic series with exponential growth and additive seasonality. In other words, the slope keeps getting steeper with time:

You can see how filter seems to fit the data very well. The fitted line is red. However, if you attempt to predict with this filter, it fails miserably. The true line is black, and the red if fitted with blue confidence bounds on the next plot:

The reason why it fails is easy to see by examining Holt Winters model equations. It extracts the slope from past, and extends to future. This works very well when the slope is stable, but when it is consistently growing the filter can't keep up, it's one step behind and the effect accumulates into an increasing forecast error.

R code:

t=1:150
a = 0.04
x=ts(exp(a*t)+sin(t/5)*sin(t/2),deltat = 1/12,start=0)

xt = window(x,0,99/12)
plot(xt)
(m <- HoltWinters(xt))
plot(m)
plot(fitted(m))

xp = window(x,8.33)
p <- predict(m, 50, prediction.interval = TRUE)
plot(m, p)
lines(xp,col="black")

In this example you may be able to improve filter performance by simply taking a log of series. When you take a logarithm of exponentially growing series, you make its slope stable again, and give this filter a chance. Here's example:

R code:

t=1:150
a = 0.1
x=ts(exp(a*t)+sin(t/5)*sin(t/2),deltat = 1/12,start=0)

xt = window(log(x),0,99/12)
plot(xt)
(m <- HoltWinters(xt))
plot(m)
plot(fitted(m))

p <- predict(m, 50, prediction.interval = TRUE)
plot(m, exp(p))

xp = window(x,8.33)
lines(xp,col="black")

Best Answer

Related Solutions

Time Series – Can a Nonstationary ARMA Model Always Be Made Stationary After Differencing?

Solved – If an auto-regressive time series model is non-linear, does it still require stationarity

Holt Winters Example

Related Question