Solved – Build a VECM model for stock price prediction and interpreting output

cointegrationmultivariate analysisrtime seriesvector-error-correction-model

I am using VECM model in R for stock price prediction. For prediction I used open price, closing price and high price of that day and I try to predict closing price. At first I checked if data is cointegrated.

###################### 
# Johansen-Procedure # 
###################### 

Test type: maximal eigenvalue statistic (lambda max) , with linear trend 

Eigenvalues (lambda):
[1] 0.189351689 0.087487739 0.002125514

Values of teststatistic and critical values of test:

           test 10pct  5pct  1pct
r <= 2 |   2.64  6.50  8.18 11.65
r <= 1 | 113.71 12.91 14.90 19.19
r = 0  | 260.72 18.90 21.07 25.75

So there are two cointegration relationships. And I used "tsdyn" package to make VECM model. Prediction made by this model were quit accurate. But I am not sure if my approach is correct. So my questions are:

  1. Do I need to check if variables are non-stationary before making a model?
  2. Is this the right way to make a model or do I need to make VAR model first and then convert it into VECM ?
  3. Also I don't understand which equation is right in this VECM model, because I get table like this:

Full sample size: 1248  End sample size: 1241
Number of variables: 3  Number of estimated slope parameters 63
AIC -43935.64   BIC -43602.6    SSR 0.1105005
Cointegrating vector (estimated by ML):
             X1 X2        X3
r1 1.000000e+00  0 -1.001445
r2 1.457168e-16  1 -1.001706


            ECT1                ECT2               Intercept          
Equation X1 -0.3465(0.5966)     0.0524(0.6139)     -0.0039(0.0026)    
Equation X2 1.0826(0.0837)***   -1.1161(0.0861)*** -0.0026(0.0004)*** 
Equation X3 0.9952(0.3554)**    -0.7579(0.3657)*   0.0026(0.0015).    
            X1 -1               X2 -1              X3 -1             
Equation X1 0.3506(0.5908)      0.0096(0.5576)     -0.3381(0.1381)*  
Equation X2 -0.0801(0.0829)     0.1134(0.0782)     -0.0521(0.0194)** 
Equation X3 -0.1466(0.3520)     0.5766(0.3322).    -0.6551(0.0822)***
            X1 -2               X2 -2              X3 -2             
Equation X1 0.2849(0.5322)      0.1284(0.4969)     -0.3320(0.1386)*  
Equation X2 -0.0671(0.0747)     0.1095(0.0697)     -0.0613(0.0195)** 
Equation X3 -0.0721(0.3170)     0.5760(0.2960).    -0.5673(0.0826)***
            X1 -3               X2 -3               X3 -3              
Equation X1 0.1235(0.4707)      -0.0572(0.4328)     -0.2138(0.1353)    
Equation X2 -0.0716(0.0661)     0.1058(0.0607).     -0.0343(0.0190).   
Equation X3 -0.1478(0.2804)     0.3648(0.2579)      -0.4097(0.0806)*** 
            X1 -4               X2 -4               X3 -4              
Equation X1 0.1533(0.4067)      -0.1529(0.3538)     -0.1084(0.1272)    
Equation X2 -0.0708(0.0571)     0.0330(0.0496)      -0.0332(0.0179).   
Equation X3 -0.0847(0.2423)     0.1134(0.2107)      -0.2806(0.0758)*** 
            X1 -5               X2 -5               X3 -5              
Equation X1 0.2247(0.3295)      -0.0238(0.2539)     -0.1056(0.1135)    
Equation X2 -0.0023(0.0462)     0.0055(0.0356)      -0.0333(0.0159)*   
Equation X3 0.0981(0.1963)      0.0854(0.1513)      -0.2209(0.0676)**  
            X1 -6               X2 -6               X3 -6              
Equation X1 0.0332(0.2312)      -0.0272(0.0515)     0.0937(0.0881)     
Equation X2 0.0118(0.0324)      0.0007(0.0072)      -0.0086(0.0124)    

Best Answer

  1. Yes, because many models assume the variables are stationary. When the assumption is violated, the fitted model might not make sense (e.g. the left hand side would diverge from the right hand side asymptotically) and the $p$-values of individual coefficients could be wrong (because the coefficient estimators could have nonstandard distributions with nonstandard critical values).
  2. Building a VAR model before the VECM is typically used for lag order selection. If you have another alternative for that (e.g. lag order implied by theory or your gut feeling), then you can skip the VAR and directly estimate a VECM.
  3. In the VECM you have three dependent variables. Each block of 3 lines is associated with these three dependent variables. The first line of each block is associated with the first dependent variable, the second - with the second, and the third - with the third. So to get the equation for the first dependent variable, collect the first lines of all the blocks; etc.

Regarding the suitability of a VECM for opening, high and closing price:

  • Cointegrating vector for the opening and the closing prices should be (1,-1), possibly when adjusted for drift (e.g. if the nominal price rises over time with inflation or due to other reasons, the closing price will be on average slightly higher than the opening price; but the difference would be tiny).
  • There are some implications of the fact that the opening price precedes the closing price in time, on any given day. The opening price should fully error-correct toward's last day's closing price; otherwise there would be a systematic difference between opening and closing price, which is unlikely, unless due to market microstructure, but still not that plausible. Meanwhile, the closing price should not error-correct towards the opening price as the opening price is later in time and there is no reason for it to go back to some previous level. E.g. if there were some news during the day that pushed the closing price away from the opening price, there is no reason for the next day's closing price to correct towards last day's opening price.
  • Lags of prices are unlikely to be relevant as stock prices are known to not be autocorrelated. Thus you could likely do better by dropping all or most of the lags.
Related Question