Solved – Testing for cointegration and building a VEC model

cointegrationrtime seriesvector-autoregressionvector-error-correction-model

I have 3 variables which are all stationary at 2nd order difference. I want to check for cointegration using the piece of code below. If I run pairwise cointegration analysis then I get these results:

VARselect(f1[2:3], lag.max=10)$selection ## optimal no of lags to be 7
coint=ca.jo(f1[2:3], ecdet="none", type="trace", K=7, spec="longrun")
summary(coint) ## indicates cointegrating relationship
Values of teststatistic and critical values of test:
          test 10pct  5pct  1pct
r <= 1 | 29.23  6.50  8.18 11.65
r = 0  | 75.18 15.66 17.95 23.52

This means that there is no cointegrating relationship between them. If I do this for other variables, f1[3:4] and f1[c(2,4)] then I get one cointegrating relationship.

VARselect is used to choose the optimal lag. For all the variables together:

VARselect(f1[2:4], lag.max=10)$selection
AIC(n)  HQ(n)  SC(n) FPE(n) 
     5      5      5      4 

coint=ca.jo(f1[2:4], ecdet="none", type="trace", K=5, spec="longrun")
summary(coint)
Values of test statistic and critical values of test:
          test 10pct  5pct  1pct
r <= 2 |  0.08  6.50  8.18 11.65
r <= 1 | 14.24 15.66 17.95 23.52
r = 0  | 39.67 28.71 31.52 37.22
  1. Do I need to take in all variable while running a VECM?
  2. Is VARselect the right way to choose the lag to be specified in ca.jo?

    This would mean that there is cointegration between the variables and I need to run a VECM. But how do I know how many cointegrating relationships are there. As far as i have seen $r=2$ will be specified while doing a vecm

  3. Is $r=2$ the correct way to specify a VECM?

    cajools(coint)
    cajorls(coint, r = 2) # or use this

    Is this procedure that I am following a correct way to model?

Update 1:

  1. For 1. I think it is up to us to determine what kind of relationship we would like to examine and then set up a model!
  2. Ya its a iterative VAR to choose the right lag length.
  3. Not clear: so the highest rank I can not reject would be 2 for the 3 variable case?

Update 2: Regarding 3. I was asking for the f1[2:4] where I produced the statistics. According to me there is only 1 cointegrating relationship. So $r=1$ in fitting a VECM.

Update 3:

  1. As my variables becomes stationary at 2nd order of difference, can I perform a Johansen co-integration which works at I(1)? Or do I have to feed in the first difference of my variables in order to perform Johansen co-integration.
  2. Also since using VARselect the optimal lag turned out to be 4. So I have to take lag=3 while running a cointegration model.

Best Answer

You seem to be doing pairwise analysis when you in fact have three variables. This way you may miss cointegrating relationships that are not pairwise but involve more variables. The standard way in modelling of cointegrated variables is to use all the variables you have if they are integrated of the same order.

Now to answer your questions,

  1. Yes, include all three variables in VAR modelling and cointegration testing.
  2. Yes, it is an acceptable method. You can find it used, e.g., in Pfaff (2008), p. 149 or in the vignette of "vars" package in R, p. 17.
  3. Johansen procedure as implemented in function ca.jo will help you find the number of cointegrating vectors. Take the output of ca.jo, start with $r=0$ and see if you can reject the null hypothesis of $r=0$ using the test statistic and the critical values reported in the output. If you reject, move to $r=1$ and upwards until you cannot reject. The first rank that you cannot reject is the number of cointegrating vectors. If you can reject all of them, all of your series appear to be stationary.
    In general, any modern time series textbook should include a section on cointegration testing using Johansen procedure; just follow it.

Update (for the updated OP)

  1. Neglecting cointegration relationships beyond pairwise ones may lead to omitted variable bias -- because you would be omitting error correction terms associated with the neglected cointegrating vectors.
  2. Cannot understand the question, sorry.
  3. If the variables are truly integrated, you cannot have $r=m$ for $m$ being the number of time series in the system. In a three-variable case, $r=2$ is the highest rank; $r=3$ already implies the variables are not integrated to begin with.

References

  • Pfaff, Bernhard. Analysis of integrated and cointegrated time series with R. Springer Science & Business Media, 2008.
  • Pfaff, Bernhard. "VAR, SVAR and SVEC models: Implementation within R package vars." Journal of Statistical Software 27.4 (2008): 1-32.