Solved – Choosing the maximum lag length in the augmented Dickey-Fuller test

augmented-dickey-fullerlagstime seriesunit root

I have a question regarding how to choose the maximum lag length in the augmented Dickey-Fuller test using the "urca" package in R.

I want to perform the ADF test on the daily price of a stock index for 12 years. I used the AIC in the command to choose the optimal number of lags. However, the problem is, I don't know which number I should set for the maximum lag length. If I set the maximum lag length equal to 1, 75, 100, 250 and 365 respectively, the test statistic is -1.5088, -2.2627, -3.0098, -3.4081 and -3.6462 respectively. These statistics will definitely lead to different results and interpretation…

I searched and found that it is often good to set the maximum lag length as 1 for annual data, 4 for quarterly data and 12 for monthly data (no information on daily data). In this sense, could you please give me any suggestions?(I know it is really silly to use such large numbers as the maximum lag length…)

Besides, for the example above, I could try the maximum length as 365 as the data quantity is large. However, for the test of a single year, the total number of data is smaller than 250. What would I do in case the maximum length you might suggest for the first question exceeds 250?

Another question is: would be better to test the log of the stock index?

Thank you very much for your kind help!

Best Answer

This is can be a tricky one. These Zivot Notes discuss a slightly more advanced way to select lags for the ADF. That being said, it is good to remember that purpose of including lags is to control for serial correlation. Consequently, you'll want to examine your error to assure that no serial correlation is present. Even a good model fit (i.e. low IC) does not ensure the absence of serial correlation.

It is important to remember that it is essential to include the lowest number of lags possible. Including erroneous lags will greatly diminish the test's power. This is especially problematic because it is well known that ADF test have low power, especially for near unit root processes.

Additionally, you should consider other tests for stochastic trends like PP or KPSS.

Finally, never loss prospective of the bigger picture. Stock prices, especially at a daily frequency, almost always follow a stochastic trend. If they did not follow a random walk, then you could forecast the prices with a relatively high level of certainty (i.e. the prediction interval for stochastic trends explode). If this was the case then you could easily forecast stock prices and make billions of dollars. But stock markets are efficient, really efficient and there is not billions of dollars laying around to be picked up. At least that is what Eugene Fama says.

Related Solutions

Augmented Dickey Fuller – Understanding k Lag in R’s Test

It's been a while since I looked at ADF tests, however I do remember at least two versions of the adf test.

http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/tseries/html/adf.test.html

http://cran.r-project.org/web/packages/fUnitRoots/

The fUnitRoots package has a function called adfTest(). I think the "trend" issue is handled differently in those packages.

Edit ------ From page 14 of the following link, there were 4 versions (uroot discontinued) of the adf test:

http://math.uncc.edu/~zcai/FinTS.pdf

One more link. Read section 6.3 in the following link. It does a far btter job than I could do in explaining the lag term:

http://www.yats.com/doc/cointegration-en.html

Also, I would be careful with any seasonal model. Unless you're sure there's some seasonality present, I would avoid using seasonal terms. Why? Anything can be broken down into seasonal terms, even if it's not. Here are two examples:

#First example: White noise
x <- rnorm(200)

#Use stl() to separate the trend and seasonal term
x.ts <- ts(x, freq=4) 
x.stl <- stl(x.ts, s.window = "periodic")
plot(x.stl)

#Use decompose() to separate the trend and seasonal term
x.dec <- decompose(x.ts)
plot(x.dec)

#===========================================

#Second example, MA process
x1 <- cumsum(x)

#Use stl() to separate the trend and seasonal term
x1.ts <- ts(x1, freq=4)
x1.stl <- stl(x1.ts, s.window = "periodic")
plot(x1.stl)

#Use decompose() to separate the trend and seasonal term
x1.dec <- decompose(x1.ts)
plot(x1.dec)

The graph below is from the above plot(x.stl) statement. stl() found a small seasonal term in white noise. You might say that term is so small that it's really not an issue. The problem is, in real data, you don't know if that term is a problem or not. In the example below, notice that the trend data series has segments where it looks like a filtered version of the raw data, and other segments where it might be considered significantly different than the raw data.

enter image description here

Solved – lag length

The lag length is how many terms back down the AR process you want to test for serial correlation. Is checking the prior one alone enough, or do you need to check in groups of 3, 4, or more. This page synopsizes the trade-offs for more or fewer lags.

Best Answer

Related Solutions

Augmented Dickey Fuller – Understanding k Lag in R’s Test

Solved – lag length

Related Question