Solved – Why are the critical values in coint_johansen in statsmodels in Python so different from the ones in ca.jo in urca in R

cointegrationpythonrstatsmodelstime series

When i run the same johansen test across Python and R I get very different critical values.

If I normalize the two columns in evec in the Python result I get fairly similar eigenvectors as the ones in R (first and third eigenvector at least). So far so good.

But the test statistics and critical values in the Python results seem strange. As far as I understand, for the first test (r=0) I can't reject the null hypothesis there is no co-integration. But for the second test (r=1) there is strong evidence (>5%) for rejecting the null hypothesis there is no co-integration. So at the same time the rank of r is both zero and greater than 1? (In other words, there are zero and >1 stationary combinations of the input data?)

Or should I simply ignore the second test if the first test isn't statistically significant?

In R the results look more straight forward. Both the first and second test are statistically insignificant.

So to summarize:

  • Are the critical values in Python incorrect?
  • Or am I simply misinterpreting the Python results?

coint_johansen in statsmodels.tsa.vector_ar.vecm in Python

import statsmodels.tsa.vector_ar.vecm as stvv
model = stvv.coint_johansen(data, 0, 2)

print("\nnormalized eigenvector 0\n", model.evec[:,0] / model.evec[:,0][0])
print("\nnormalized eigenvector 1\n", model.evec[:,1] / model.evec[:,1][0])
print("\ntest statistics\n", model.lr1[0], model.lr1[1])
print("\ncritical values\n", model.cvt[0], model.cvt[1])
print("\neig\n", model.eig)
print("\nevec\n", model.evec)
print("\nlr1\n", model.lr1)
print("\nlr2\n", model.lr2)
print("\ncvt\n", model.cvt)
print("\ncvm\n", model.cvm)
print("\nind\n", model.ind)

normalized eigenvector 0
 [ 1.         -0.18956975]

normalized eigenvector 1
 [ 1.         -1.56557504]

test statistics
 12.170080461590794 4.8892460854155075

critical values
 [13.4294 15.4943 19.9349] [2.7055 3.8415 6.6349]

eig
 [0.0095468  0.00642099]

evec
 [[ 0.04683733 -0.00754275]
 [-0.00887894  0.01180875]]

lr1
 [12.17008046  4.88924609]

lr2
 [7.28083438 4.88924609]

cvt
 [[13.4294 15.4943 19.9349]
 [ 2.7055  3.8415  6.6349]]

cvm
 [[12.2971 14.2639 18.52  ]
 [ 2.7055  3.8415  6.6349]]

ind
 [0 1]

ca.jo in urca library in R

> model = ca.jo(data.frame(data$a, data$b), type="trace", ecdet="const",K=2,spec="longrun")
> summary(model)

###################### 
# Johansen-Procedure # 
###################### 

Test type: trace statistic , without linear trend and constant in cointegration 

Eigenvalues (lambda):
[1] 1.184071e-02 7.329430e-03 5.204170e-18

Values of teststatistic and critical values of test:

          test 10pct  5pct  1pct
r <= 1 |  5.59  7.52  9.24 12.97
r = 0  | 14.64 17.85 19.96 24.60

Eigenvectors, normalised to first column:
(These are the cointegration relations)

              data.a.l2     data.b.l2    constant
data.a.l2     1.0000000      1.000000    1.000000
data.b.l2    -0.2446287      1.633518   -1.478432
constant  -1441.2146598 -15496.470449 7340.786442

Weights W:
(This is the loading matrix)

           data.a.l2    data.b.l2      constant
data.a.d -0.02386740 -0.001330821  2.330213e-16
data.b.d  0.01599476 -0.005125451 -6.495089e-16

Best Answer

The difference may come from the VECM specification. You have used spec = "longrun" in your R function whereas, if I am not mistaken, the Python function uses what would be spec = "transitory" in ca.jo.

Related Question