Solved – Dickey-Fuller augmented tests: how to choose lags

stataunit root

I’m trying to model a time series (log_consommation) in a ARIMA(p,d,q) using Stata.

So I start by determining d by transforming my time series to make it stationary.

My question is, when performing an augmented Dickey Fuller test to test stationarity, I have to choose the number of lags. Is this number of lags related to the p of the ARMA(p,q) model I will estimate later? How can it be determined without using dfgls?

I have tried to use several different lags but I’m not sure how to choose:
I add a table of the results I obtain.

dfuller log_consommation, lags(0) regress
regress D.log_consommation l1.log_consommation
estat ic

dfuller log_consommation, lags(1) regress
regress D.log_consommation l1.log_consommation l1.(D.log_consommation)
estat ic

dfuller log_consommation, lags(2) regress
regress D.log_consommation l1.log_consommation l1.(D.log_consommation) l2.(D.log_consommation)
estat ic

dfuller log_consommation, lags(3) regress
regress D.log_consommation l1.log_consommation l1.(D.log_consommation) l2.(D.log_consommation) l3.(D.log_consommation)
estat ic

dfuller log_consommation, lags(4) regress
regress D.log_consommation l1.log_consommation l1.(D.log_consommation) l2.(D.log_consommation) l3.(D.log_consommation) l4.(D.log_consommation)
estat ic

I obtain :

Augmented Dickey-Fuller test regressions and test statistic
--
                      (1)             (2)             (3)             (4)             (5)             (6)   
                   No lag           1 lag          2 lags          3 lags          4 lags          5 lags   
--
L.log_cons~n      -0.0137***      -0.0105***     -0.00741***     -0.00678**      -0.00662**      -0.00574*  
                  (-8.06)         (-5.12)         (-3.40)         (-2.96)         (-2.76)         (-2.31)   

LD.log_con~n                        0.173*          0.131           0.118           0.113           0.108   
                                   (2.00)          (1.50)          (1.28)          (1.21)          (1.16)   

L2D.log_co~n                                        0.283**         0.249**         0.245*          0.228*  
                                                   (3.34)          (2.82)          (2.62)          (2.43)   

L3D.log_co~n                                                       0.0608          0.0527          0.0217   
                                                                 (0.68)          (0.57)          (0.22)   

L4D.log_co~n                                                                       0.0195        -0.00939   
                                                                                   (0.22)         (-0.10)   

L5D.log_co~n                                                                                        0.119   
                                                                                                   (1.32)   

_cons               0.176***        0.135***       0.0952***       0.0874**        0.0855**        0.0743*  
                   (8.57)          (5.34)          (3.52)          (3.07)          (2.85)          (2.39)   


aic                -888.5          -889.0          -890.1          -881.7          -871.4          -863.4   
bic                -882.8          -880.6          -878.9          -867.7          -854.7          -844.0   
t_ADF              -8.062          -5.118          -3.399          -2.965          -2.759          -2.310   


t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Best Answer

Use ac and pac in Stata to assess the possible lags. However, if you are using the ARMA model, it is normal to estimate arma for the candidate models with p=0, q=1 and so on to p=3 and q=3. Then obtain the aic and bic. The model with the lowest aic or bic is chosen. The lags chosen by these criteria may differ, but you have to make sure that the residuals of these models are white noise at their chosen lags.