Granger Causality Test – Determining Lag Order

granger-causalitylags

Suppose I'm considering several independent variables for possible inclusion in an ARIMAX model I'm developing. Before fitting different variables, I'd like to screen out variables that exhibit reverse causality by using a Granger test (I'm using the granger.test function from the MSBVAR package in R, although, I believe other implimentations work similarly). How do I determine how many lags should be tested?

The R function is: granger.test(y, p), where y is a data frame or matrix, and p is the lags.

The null hypothesis is that the past $p$ values of $X$ do not help in
predicting the value of $Y$.

Is there any reason not to select a very high lag here (other than the loss of observations)?

Note that I have already differenced every time series in my data frame, based on the order of integration of my dependent time series. (E.g., differencing my dependent time series once made it stationary. Therefore, I also differenced all "independent" time series once.)

Best Answer

The trade-off is between bias and power. Too few lags, you have a biased test because of the residual auto-correlation. Too many, you allow for potentially spurious rejections of the null - some random correlation might make it look like $X$ helps predict $Y$. Whether or not that's a practical concern depends on your data, my guess would be to lean higher, but lag length can always be determined as follows:

Granger causality always has to be tested in the context of some model. In the specific case of the granger.test function in R, the model has p past values of each of the two variables in the bivariate test. So the model it uses is:

$$ y_{i,t}=\alpha+\sum_{l=1}^p \beta_ly_{i,t-l} + \gamma_lx_{i,t-l}+\epsilon_{i,t} $$

A conventional way to choose $p$ for this model would be to try this regression with various values of $p$ and use keep track of the AIC or BIC for each lag length. Then run the test again using the value of $p$ which had the lowest IC in your regressions.

In general the number of lag in the model can be different for $x$ and $y$ and a Granger test will still be appropriate. It's in the specific case of the implementation of granger.test that your constrained to the same number of lags for both. This is a matter of convenience not a theoretical necessity. With different lag lengths for the two variables, you can still use the AIC or BIC to select your model, you'll just have to compare many combinations $n$ lags of $x$ and $m$ lags of $y$. See this.

Just an extra word - because the Granger test is model dependent, omitted variables bias may be a problem for Granger causality. You may want to include all the variables in your model, and then use Granger causality to exclude blocks of them instead of using the granger.test function which only does pair-wise tests.

Related Solutions

Solved – Lag length selection Granger causality test

The question here is really about the best way to select lag length for a VAR, as I noted in this answer. Granger causality doesn't even enter into it until your model for the time series is selected, which is why you may not see many papers specifically concerned with lag order for Granger causality tests. It's more about lag order selection for vector autoregressive models. I'd take a look at this paper for a relatively recent reference on which criteria (AIC, BIC, SIC, HQC) are most appropriate, though they may largely agree for your application.

Solved – How to run a Granger Causality Test with Stata

I am by far no expert on time-series but these are my thoughts for what it is worth. Hopefully someone else could add to this to help you further on your way.

Does it make sense?

To me it doesn't really make a lot of sense. When I do panel data analysis I base the choice of my variables on the results in the literature. There should be a theoretical basis for your model.

I would just use the Granger causality test as a method of analysis. This paper might be of interest of you, where they use a Granger test in a panel data setting.

If the time series are non-stationary could I run the Granger CT or should I have to make time series stationary with some cointegration process before?

Yes you should make the time-series stationary as the VAR-model that you use to do the test assumes that the data is stationary. If your time-series has a unit root, often first differencing will eliminate this unit root.

How can I do it with Stata?

First differencing can be done by using the D-command (don't forget to time-set your data first)

So if you have your time-series called gdp then you first difference it by:

gen gdpdiff=D.gdp

You can set up the VAR model by using the var-command. For help on this simply type

help var

So the command for your VAR-model could be:

var fdi gdpdiff

Use varsoc to test the optimal length of the number of lags that need to be included. So in the command below I test the first 20 lags.

varsoc, lag(20)

The run your model with the desired number of lags, for instance

var fdi gdpdiff, lag(1/10)

After fitting the var-model you can do the Granger causality test using:

vargranger

How can I interpret the results?

I found this post quite useful on how to conduct and interpret a Granger causality test (it is done in R). Be aware that the null hypothesis is one on non Granger causality.

Best Answer

Related Solutions

Solved – Lag length selection Granger causality test

Solved – How to run a Granger Causality Test with Stata

Related Question