time-series – Should Data Be Differenced Before Running an ADF Test?

augmented-dickey-fullerdifferencingstationaritytime seriestrend

I plot my data as shown in the following screenshot:

Clearly the series contains a trend. A first order difference of my data will eliminate the trend, which I plot as follows:

Now I would like to run an Augmented Dickey-Fuller (ADF) test, using the function adf.test from the R package aTSA. Usually the output of this function will contain three types: 'no drift no trend', 'with drift and trend' and 'with drift and trend'. But I am confused whether I should test the differenced series and then rely on the type 'with drift no trend', or should I test the original series and rely on the type 'with drift and trend'? Can anyone help?

Best Answer

No, you should not difference your series before running an ADF test. You would difference a series that contains a unit root. To find that out, you can run an ADF or some other unit-root test. The test result will give you an indication of whether you need to difference your time series. So you do the ADF first rather than last.

Your particular series appears to have a deterministic linear trend. A sound way to deal with that is either (1) to fit a linear trend and work with the residuals or (2) to include a linear trend in a model for the series. Differencing is not warranted, because a linear trend does not imply a unit root.

(You may look up the term overdifferencing to see what can go wrong when differencing a time series that does not have a unit root.)

Related Solutions

Augmented Dickey-Fuller Test – Difference Among None, Drift, and Trend

The Wikipedia page states the following:

The testing procedure for the ADF test is the same as for the Dickey–Fuller test but it is applied to the model $$ \Delta y_t = \alpha + \beta t + \gamma y_{t-1} + \delta_1 \Delta y_{t-1} + \cdots + \delta_{p-1} \Delta y_{t-p+1} + \varepsilon_t $$

As you very well note, there are variations of the test, which involve restricting $\alpha$ and/or $\beta$ equal to 0. Imposing the restriction on $\alpha$ corresponds to omitting a constant while restricting $\beta$ corresponds to omitting a time trend.

To understand what you're doing when using the adf.test() function from the tseries package in R, we should first consult the documentation provided by the package authors. To do this, we execute ?adf.test in the R console. Doing this will provide us details about the function; what it does, how we can use it, etc. For present purposes, we just need to be aware that the documentation states:

The general regression equation which incorporates a constant and a linear trend is used and the t-statistic for a first order autoregressive coefficient equals one is computed.

(Do we need more information than that?)

Coupled with that fact, if we look at the usage of the function; namely,

adf.test(x, alternative = c("stationary", "explosive"),
         k = trunc((length(x)-1)^(1/3)))

one begins to think that the function has limited capabilities with regard to the restricted variations of the ADF test. Reading all of the documentation seems to make it clear that the function only runs one variation of the test; the unrestricted version, which includes both a constant and a trend.

(Do we need more information than that?)

Since you're using R, we don't have to be left wondering if the function somehow imposes the restrictions internally without us knowing! To really be sure what's going on behind the scenes, we can look at the source code of the adf.test() function. Below, I step through the code, which I have shortened, and I hope it's instructive to you.

# Import some toy data
data(sunspots)

# Set arguments that are normally function inputs
x           <- sunspots
alternative <- "stationary"
k           <- trunc((length(x) - 1)^(1/3))

# Let the function go to work! (short version)
k <- k + 1          # Number of lagged differenced terms
y <- diff(x)        # First differences
n <- length(y)      # Length of first differenced series
z <- embed(y, k)    # Used for creating lagged series

# Things get interesting here as variables are prepared for the regression
yt  <- z[, 1]       # First differences
xt1 <- x[k:n]       # Series in levels - the first k-1 observations are dropped
tt  <- k:n          # Time-trend
yt1 <- z[, 2:k]     # Lagged differenced series - there are k-1 of them

# Next, the key pieces of code.

# Regression 1: if k > 0
# The augmented Dickey-Fuller test (with constant and time-trend)
res <- lm(yt ~ xt1 + 1 + tt + yt1) 

# Regression 2: if k = 0
# The standard Dickey-Fuller test (with constant and time-trend)
res <- lm(yt ~ xt1 + 1 + tt)

By my count, the adf.test() function is, in fact, made up of 57 lines of code, which I encourage you to inspect. The rest of the function code is not important in the context of this question. All that needs to be known is that the function does do what it says on the tin. Importantly, there does not seem to be a high level way of using the function to run a restricted variation of the ADF test and retrieve the associated critical values.

What to do? Your first instinct should be to check out the CRAN Task View: Time Series Analysis page. In doing so, you'll learn that the urca package provides an alternative implementation of the ADF test. Indeed, as I mentioned in the comments, the ur.df() function should be able to meet your needs. Inspecting the function usage is quite informative!

ur.df(y, type = c("none", "drift", "trend"), lags = 1, 
      selectlags = c("Fixed", "AIC", "BIC"))

The urca package can be found here and I recommend consulting the package documentation and the source code if you need to. I suspect that you should be able to use the function and not worry about issues regarding critical values; the authors of the package will have taken care of that so you can concentrate on using it as a high-level function and doing your research.

In terms of applying the ADF test (knowing which tests to run and in which order), I would suggest the Dolado et al. procedure. The reference is:

Dolado, J. J., Jenkinson, T., and Sosvilla-Rivera, S. (1990). Cointegration and unit roots, Journal of Economic Surveys, 4, 249-273.

Final note on matching the R code to the mathematical equation. You can basically think of it as follows (strictly speaking, the parameters should be omitted, but...):

yt = $\Delta y_{t}$

xt = $\gamma y_{t-1}$

+ 1 = $\alpha$

tt = $\beta t$

yt1 = $\delta_1 \Delta y_{t-1} + \cdots + \delta_{p-1} \Delta y_{t-p+1}$

ADF Test – Should a Trend Be Selected if the Trend Changes in Unit Root Testing?

I think the best test specification would be neither ADF with no trend nor ADF with a linear trend, because clearly none of the alternatives adequatly reflects the actual trend in the data.

You may consider using covariate-augmented Dickey-Fuller (CADF) test proposed in Hansen "Rethinking the Univariate Approach to Unit Root Testing: Using Covariates to Increase Power" (1995). Hansen's own R code for the test is available here. There is also an R package "CADFtest" by Claudio Lupi with a vignette and a reference manual which may be more readily usable than Hansen's code.

For the CADF test you would supply two regressors, t1=c(1:br,rep(0,T-br)) and t2=c(rep(0,br),1:(T-br)) to account for the two linear components of the trend, where br is the last point of the upward-trending period and T is the lenght of the data sample.

However, I am unsure how the use of t1 and t2 fits the stationarity requirement for the regressors. Since the trend components in t1 and t2 are nonstationary, they might mess up the null distribution of the parameter of interest in the CADF test regression. That could be a good argument for not using the CADF test in this situation.

If so, you could perhaps just split your sample into two parts and use the regular ADF test with a trend for each of them. It should be better than using the ADF test for the whole sample regardless of inclusion or exclusion of a linear trend. Doing the latter might well induce the ADF test to suggest presence of a unit root even if the process around this "broken trend" is actually stationary.

The last option for someone who is good at unit root asymptotics would be to derive the appropriate null distribution of the CADF test in this "broken trend" setting.

(Here is a somewhat related post.)

Best Answer

Related Solutions

Augmented Dickey-Fuller Test – Difference Among None, Drift, and Trend

ADF Test – Should a Trend Be Selected if the Trend Changes in Unit Root Testing?

Related Question