Solved – VAR or VECM for a mix of stationary and nonstationary variables

cointegrationstationaritytime seriesvector-autoregressionvector-error-correction-model

I have 4 time series. One of them is stationary and rest of them are not. I need to find relation between them. I will use AIC to decide lag length.

Should I use VAR or VECM to find relation between them?
Will VAR or VECM give me relation in terms of equation which can be used for forecasting?
Do I need to perform Johansen's test of cointegration?
What good would it do?

Best Answer

So you have three nonstationary series and one stationary series. Let us call them $x_1$, $x_2$, $x_3$, and $x_4$, respectively. Suppose the nonstationarity of $x_1$, $x_2$, $x_3$ is of a unit-root kind (rather than of some other kind); that is, each of $x_1$, $x_2$, $x_3$ is integrated of order one, I(1). You can determine the order of integration using, for example, the augmented Dickey-Fuller test (ADF test).

Test each pair of the nonstationary series ($x_1$ and $x_2$; $x_1$ and $x_3$; $x_2$ and $x_3$) for cointegration using the Johansen or the Engle-Granger test.
Then test all three series ($x_1$, $x_2$, $x_3$) for cointegration using the Johansen test.
Depending on the results of the tests, you may find yourself in one of the following situations:

(A) No cointegration
(B) Two of the variables (say, $x_1$ and $x_2$) are cointegrated while the third variable (say, $x_3$) is not
(C) The three variables ($x_1$, $x_2$, $x_3$) are cointegrated

In general, you want the following:

Models for cointegrated variables should have an error-correction representation; otherwise the model would be misspecified (cointegration goes hand-in-hand with the error correction representation).
Models for stationary dependent variables should not have nonstationary explanatory variables (except perhaps for stationary combinations of cointegrated nonstationary variables); otherwise the linear combination of the regressors would diverge from the regressand.
Models for nonstationary dependent variables should have at least one nonstationary explanatory variable; otherwise the regressand would diverge from the linear combination of the regressors. Mind nonstandard distributions of estimators for the integrated variables.

Based on these principles, you may do the following:

If (A) then first-difference each of the three variables ($x_1$, $x_2$, $x_3$), and use them together with the stationary variable $x_4$ to build a VAR model.

If (B) then build a model where

$\Delta x_1$ depends on the error correction term and lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$;
$\Delta x_2$ depends on the error correction term and lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$;
$\Delta x_3$ depends on lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$;
$x_4$ depends on the error correction term and lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$.

If (C) then build a model where

$\Delta x_1$ depends on the error correction term and lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$;
$\Delta x_2$ depends on the error correction term and lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$;
$\Delta x_3$ depends on the error correction term and lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$;
$x_4$ depends on the error correction term and lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$.

These are pretty general models with lots of regressors. You may find it beneficial to exclude some variables from some equations or use penalization to avoid overfitting.

Related Solutions

Solved – Interpreting VECM result

The Johansen procedure which you performed is used to decide how many cointegrating relationships you have. To perform it, first you need to choose appropriate number of lags, which can be done by using function VARselect from package vars. You should also choose whether constant, or trend terms are estimated besides cointegration relationship or not. This decision is usually made based on data or/and model.

After that, the main result of Johansen procedure is the table of test statistics:

          test  10pct 5pct  1pct    
r <= 3 |  1.48  7.52  9.24 12.97    
r <= 2 |  6.89 13.75 15.67 20.20    
r <= 1 | 10.19 19.77 22.00 26.81    
r = 0  | 19.07 25.56 28.14 33.24

It should be read from the bottom to the top. The last row has the hypothesis that there are no cointegrating relationships, and in your case this hypothesis is not rejected. This means that there are no cointegrating relationships in your data and it is not possible to use VEC model with this data, and you should use VAR on the first differences, since all of your time series has unit root.

If you are really sure that there is a long term relationship in your data, then check that you are using the correct number of lags and appropriate dummy variables (constant, trend, seasonal dummies, etc) and then rerun the Johansen procedure again.

As I said the table should be read from bottom to the top. You start at the last row and if the hypothesis is rejected (the test statistic is larger than critical values), then you move one row up, until the test statistic is not rejected. The number of row counting from the bottom, for which the hypothesis is accepted is the number of cointegrating relationships minus one, i.e. the last row means zero, the second last one, etc.

If you only want to use the model for forecasting, then you can convert the resulting VEC model to VAR using function vec2var, where you supply as arguments the output of ca.jo and the number of cointegration relationships. You can then forecast from the resulting model with the function predict, using the argument n.ahead to indicate how many steps you want to forecast.

Solved – Interpretation of VAR and causality

Regarding (2) stationarity/unit-root testing: you say That means I can safely conclude that there is no unit root and both series are stationary. The conclusion should be the opposite: you reject the H0 for KPSS test and you cannot reject the H0 for ADF test. Both results indicate the presence of a unit root.

Regarding (3), the third and fourth lines of the code piece indicates that the best model according to AIC is with 4 lags. You would choose the model that minimizes the AIC, not maximizes it.

Regarding (4), you find cointegration, although the result is not very certain because the test statistic for r <= 1 falls in between the 5% and 10% critical values.

Regarding (5), restricting X2 to depend only on its own lags but not on the lags of X1 does not cause the model likelihood to drop very much, so you get an insignificant $F$ statistic. That is, you do not reject the null hypothesis that X1 does not Granger-cause X2. I cannot comment on whether your approach is legitimate (due to the variables being both integrated and cointegrated). However, Dave Giles has some excellent posts about Granger causality in his blog, e.g. here and here. After reading them, you should be able to understand it quite well.

Regarding (6), since the series are not stationary, running VAR is not a good idea. Run a VECM instead.

I hope my notes largely answer your questions 1. through 4. I did not write out model equations for you, but I may comment after you try putting them together by yourself. (It could be beneficial to try by yourself first.)

Best Answer

Related Solutions

Solved – Interpreting VECM result

Solved – Interpretation of VAR and causality

Related Question