The trade-off is between bias and power. Too few lags, you have a biased test because of the residual auto-correlation. Too many, you allow for potentially spurious rejections of the null - some random correlation might make it look like $X$ helps predict $Y$. Whether or not that's a practical concern depends on your data, my guess would be to lean higher, but lag length can always be determined as follows:
Granger causality always has to be tested in the context of some model. In the specific case of the granger.test
function in R, the model has p past values of each of the two variables in the bivariate test. So the model it uses is:
$$
y_{i,t}=\alpha+\sum_{l=1}^p \beta_ly_{i,t-l} + \gamma_lx_{i,t-l}+\epsilon_{i,t}
$$
A conventional way to choose $p$ for this model would be to try this regression with various values of $p$ and use keep track of the AIC or BIC for each lag length. Then run the test again using the value of $p$ which had the lowest IC in your regressions.
In general the number of lag in the model can be different for $x$ and $y$ and a Granger test will still be appropriate. It's in the specific case of the implementation of granger.test
that your constrained to the same number of lags for both. This is a matter of convenience not a theoretical necessity. With different lag lengths for the two variables, you can still use the AIC or BIC to select your model, you'll just have to compare many combinations $n$ lags of $x$ and $m$ lags of $y$. See this.
Just an extra word - because the Granger test is model dependent, omitted variables bias may be a problem for Granger causality. You may want to include all the variables in your model, and then use Granger causality to exclude blocks of them instead of using the granger.test
function which only does pair-wise tests.
Follow this procedure (Engle-Granger Test for Cointegration):
1) Test to see if your series are stationary using adfuller test (stock prices and GDP levels are usually not)
2) If they are not, difference them and see if the differenced series are now stationary (they usually are).
3) If they are, your ORIGINAL series are said to be each integrated (I did not say co-integrated) of order 1; concisely noted as I(1).
4) If they are not both I(1), you can say safely say that they can not be co-integrated of order 1.
5) If they are both I(1), run a simple OLS regression of one of the other.
6) Check the residual of the OLS for stationarity. If they are stationary, then your original series are co-integrated of order 1.
Shortcomings of this method: 1) It may matter which variable you regress on the other, 2) it works only when you have two variables.
For a better test, you can use Johansen's procedure (https://github.com/josef-pkt/statsmodels/commit/29f0aa27d284ac0026e90ff9d877f7920a2c6056)
http://nbviewer.jupyter.org/github/mapsa/seminario-doc-2014/blob/master/cointegration-example.ipynb.
Best Answer
The Toda-Yamamoto procedure for testing Granger causality is described very clearly and explicitly as a 13-step sequence in Dave Giles' blog post "Testing for Granger causality". There is no point in reiterating it here.
Regarding lag order selection, Dave Giles suggests starting with the lag selected by an information criterion such as AIC or BIC. He then emphasizes the need to ensure that there is no serial correlation in the residuals ("If need be, increase $p$ until any autocorrelation issues are resolved"). Therefore, your approach seems fine.
Regarding the maximum lag order, I do not have a precise answer. You should be cautious not to use too small a maximum lag to leave enough room for AIC/BIC to do the job. I would select a pretty large maximum lag and leave the rest for AIC/BIC. AIC/BIC would normally strike a good balance so that even if you allow for a really high maximum lag, it would not be selected and no harm would be caused.