The trade-off is between bias and power. Too few lags, you have a biased test because of the residual auto-correlation. Too many, you allow for potentially spurious rejections of the null - some random correlation might make it look like $X$ helps predict $Y$. Whether or not that's a practical concern depends on your data, my guess would be to lean higher, but lag length can always be determined as follows:
Granger causality always has to be tested in the context of some model. In the specific case of the granger.test
function in R, the model has p past values of each of the two variables in the bivariate test. So the model it uses is:
$$
y_{i,t}=\alpha+\sum_{l=1}^p \beta_ly_{i,t-l} + \gamma_lx_{i,t-l}+\epsilon_{i,t}
$$
A conventional way to choose $p$ for this model would be to try this regression with various values of $p$ and use keep track of the AIC or BIC for each lag length. Then run the test again using the value of $p$ which had the lowest IC in your regressions.
In general the number of lag in the model can be different for $x$ and $y$ and a Granger test will still be appropriate. It's in the specific case of the implementation of granger.test
that your constrained to the same number of lags for both. This is a matter of convenience not a theoretical necessity. With different lag lengths for the two variables, you can still use the AIC or BIC to select your model, you'll just have to compare many combinations $n$ lags of $x$ and $m$ lags of $y$. See this.
Just an extra word - because the Granger test is model dependent, omitted variables bias may be a problem for Granger causality. You may want to include all the variables in your model, and then use Granger causality to exclude blocks of them instead of using the granger.test
function which only does pair-wise tests.
Granger Causality can be defined as:
"X is said to Granger-cause Y if Y can be better predicted using the histories of both X and Y than it can by using the history of Y alone." 1
or in other words:
"A time series X is said to Granger-cause Y if it can be shown, usually through a series of t-tests and F-tests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y."
Wikipedia
This is essentially what a vector autoregression model does.
Also wikipedia says: "Multivariate Granger causality analysis is usually performed by fitting a vector autoregressive model (VAR) to the time series."
Granger causality is therefore more a form of structurally summarizing VAR models.
For some more information: here
Best Answer
The case of no cointegration
This is easy. If you have no lags, then the model looks like \begin{aligned} \Delta x_{1,t} &= \gamma_{0,1} + u_{1,t}, \\ &\dots \\ \Delta x_{k,t} &= \gamma_{0,k} + u_{k,t}, \\ \end{aligned} where $k$ is the number of series in your model, $\gamma_0$s are intercepts (they would be set to zero if there are no time trends in the nondifferenced $x$s) and $u_t$s are error terms. Then clearly the history of series $j$ is not useful in predicting the series $i$ beyond the history of the series $i$ itself. (Actually, the history of series $j$ is not useful in predicting the series $i$, period.) And this holds for any $(i,j)=1,\dots,k$ where $i\neq j$. Therefore, none of the series Granger-causes any other series. (Also, no group of series Granger-causes another group of series.)
The case with cointegration
Consider a bivariate model for simplicity. Suppose \begin{aligned} \Delta x_{1,t} &= \gamma_{0,1} + \alpha_1 (x_{1,t-1}+\beta x_{2,t-1}) + u_{1,t}, \\ \Delta x_{2,t} &= \gamma_{0,2} + \alpha_2 (x_{1,t-1}+\beta x_{2,t-1}) + u_{2,t} \\ \end{aligned} $\beta\neq 0$ and either $\alpha_1\neq 0$ or $\alpha_2\neq 0$ or both. Then \begin{aligned} x_{1,t} &= \gamma_{0,1} + (\alpha_1+1) x_{1,t-1} + \alpha_1\ \beta x_{2,t-1} + u_{1,t}, \\ x_{2,t} &= \gamma_{0,2} + \alpha_2 x_{1,t-1} + (\alpha_2 \beta + 1) x_{2,t-1} + u_{2,t}. \\ \end{aligned} If $\alpha_1\beta\neq 0$ (i.e. if $\alpha_1\neq 0$ because we already know that $\beta\neq 0$) in the equation for $x_{1,t}$, $x_2$ Granger-causes $x_1$.
Also, if $\alpha_2\neq 0$ in the equation for $x_{2,t}$, $x_1$ Granger-causes $x_2$.
We also know that under cointegration there will be Granger causality at least one way (since $\beta\neq 0$ and either $\alpha_1\neq 0$ or $\alpha_2\neq 0$ or both), so either $x_1$ Granger-causes $x_2$ or $x_2$ Granger-causes $x_1$ or both.