I'm still a novice with time series so I'm sorry if this is a little basic.When performing an Augmented Dickey Fuller test on my data I found that lag values from 1-4 all lead me to reject the null hypothesis. Since this is the case, how should I determine the number of lags to use?
Solved – How many lags to use for ADF test if several values reject null hypothesis
augmented-dickey-fullermathematical-statisticstime series
Related Solutions
You need to examine the rest of the output to see the source of the test statistic.
The Augmented Dickey Fuller Test runs an regression of the first difference of the time series against a lag of the level values of the time series plus lagged first differences. The test statistic is based on the significance of the lagged level values, not the significance of the overall regression via the F-statistic.
The test statistic is the t-value of the lag of the level values of the time series. A rough guide to significance would be its associated p-value. However, the reported critical values at the end of the output are more appropriate. These are the adjusted critical values as calculated by MacKinnon for the ADF.
It looks like you are using the urca
package of R
. Here's is the extended output on a simulated monthly time series with annual autocorrelation and a unit root. I set the (max) lags to twice the frequency and let ur.df
(correctly) select the number of lags using the AIC.
Note the test-statistic corresponds to the t-value.
> arima.sim(list(ar=c(rep(0,11),0.8),order=c(12,1,0)),1000)->x
> summary(ur.df(x,type="none",lags=24,selectlags="AIC"))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression none
Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
Residuals:
Min 1Q Median 3Q Max
-3.1882 -0.6450 0.0481 0.7121 3.4243
Coefficients:
Estimate Std. Error t value Pr(>|t|)
z.lag.1 -0.0004062 0.0006635 -0.612 0.5405 <-- test statistic is this t-value
z.diff.lag1 -0.0147038 0.0202857 -0.725 0.4687
z.diff.lag2 -0.0057840 0.0202965 -0.285 0.7757
z.diff.lag3 -0.0284008 0.0203133 -1.398 0.1624
z.diff.lag4 -0.0468366 0.0203374 -2.303 0.0215 *
z.diff.lag5 0.0085345 0.0203880 0.419 0.6756
z.diff.lag6 0.0153530 0.0203456 0.755 0.4507
z.diff.lag7 -0.0340281 0.0203676 -1.671 0.0951 .
z.diff.lag8 -0.0190015 0.0203950 -0.932 0.3517
z.diff.lag9 -0.0012032 0.0203491 -0.059 0.9529
z.diff.lag10 -0.0128488 0.0203783 -0.631 0.5285
z.diff.lag11 -0.0080163 0.0203859 -0.393 0.6942
z.diff.lag12 0.7818341 0.0203577 38.405 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.025 on 963 degrees of freedom
Multiple R-squared: 0.6271, Adjusted R-squared: 0.622
F-statistic: 124.6 on 13 and 963 DF, p-value: < 2.2e-16
Test-statistic is from the first t-value above. Critical values are from MacKinnon.
Value of test-statistic is: -0.6123
Critical values for test statistics:
1pct 5pct 10pct
tau1 -2.58 - 1.95 - 1.62
The overall regression is highly significant as it picks up the AR(12) nature of the time series. However, the lagged level variable is not significant. Therefore, we would not reject the null hypothesis of a unit root under this test.
The difference is due to different DF critical values. More precisely, for adf.test, the critical value is based on the model w/. drift(intercept) term while the default ur.df statistics is based on the model w/o drift(intercept) term.
You will likely see the same result if you do summary(ur.df(resid(fit1),lags=2), type='drift')
Best Answer
You typically use the longest lag that is statistically significant. You can do that easily by using an ACF graph and looking at any column that crosses through the confidence interval lines denoting statistical significance of autocorrelation given a specific lag. Of course don't test for any more lags than the frequency of your data calls for. If you have quarterly data, test up to 4 lags. If you have monthly data test up to 12 lags.
If the ADF test comes up with a high tau value and a resulting low p-value, you can reject the null hypothesis that the variable is non-stationary. In plain English, in such a situation your variable is deemed stationary (because you reject the null hypothesis that the variable is non-stationary).
In your specific situation, if you have quarterly data and you tested up to 4 lags, you are good. The ADF test has demonstrated that your variable is stationary. If you have monthly data, you may have to use more lags if the longest lag that has a statistically significant autocorrelation is longer than 4. Very often the lag 12 mth has stat. sign. autocorrelation because of seasonality. In that case you should use lag 12 within your ADF test.