First of all consider two time series, $x_{1t}
$ and $x_{2t}
$ which both are $I\left(1\right)
$, i.e. both series contain a unit root. If these two series cointegrate then there will exist coefficients, $\mu
$ and $\beta_{2}
$ such that:
$\\$
$x_{1t}=\mu+\beta_{2}x_{2t}+u_{t}\quad\left(1\right)
$
$\\$
will define an equilibrium. In order to test for cointegration using the Engle-Granger 2-step approach we would
$\\$
1) Test the series, $x{}_{1t}
$ and $x_{2t}
$ for unit roots. If both are $I\left(1\right)
$ then proceed to step 2).
$\\$
2) Run the above defined regression equation and save the residuals. I define a new “error correction” term, $\hat{u}_{t}=\hat{ecm}_{t}
$.
$\\$
3) Test the residuals ($\hat{ecm}_{t}
$) for a unit root. Note that this test is the same as a test for no-cointegration since under the null-hypothesis the residuals are not stationary. If however there is cointegration than the residuals should be stationary. Remember that the distribution for the residual based ADF-test is not the same as the usual DF-distributions and will depend on the amount of estimated parameters in the static regression above since additiona variables in the static regression will shift the DF-distributions to the left. The 5% critical values for one estimated parameter in the static regression with a constant and trend are -3.34 and -3.78 respectively.
$\\$
4) If you reject the null of a unit root in the residuals (null of no-cointegration) then you cannot reject that the two variables cointegrate.
$\\$
5) If you want to set up an error-correction model and investigate the long-run relationship between the two series I would recommend you to rather set up an ADL or ECM model instead since there is a small sample bias attached to the Engle-Granger static regression and we cannot say anything about significance of the estimated parameters in the static regression since the distribution depends upon unknown parameters.To answer your questions:1) As seen above you method is correct. I just wanted to point out that the residual based tests critical values are not the same as the usual ADF-test critical values.
$\\$
$\\$
(2) If one of the series is stationary i.e. $I\left(0\right)
$ and the other one is $I\left(1\right)
$ they cannot be cointegrated since the cointegration implies that they share common stochastic trends and that a linear relationship between them is stationary since the stochastic trends will cancel and thereby producing a stationary relationship. To see this consider the two equations:
$\\$
$x_{1t}=\mu+\beta_{2}x_{2t}+\varepsilon_{1t}\quad\left(2\right)$
$\Delta x_{2t}=\varepsilon_{2t}\quad\left(3\right)
$
Note that $\varepsilon_{2t}\sim i.i.d.
$, $x_{1t}\sim I\left(1\right)
$, $x_{2t}\sim I\left(1\right)
$, $u_{t}=\beta\prime x_{t}\sim I\left(0\right)
$, $\varepsilon_{1t}\sim i.i.d.
$
$\\$
First we solve for equation $\left(3\right)
$ and get
$\\$
$x_{2t}=x_{0}+\sum_{i=0}^{t}\varepsilon_{2i}
$
$\\$
Plug this solution into equation $\left(2\right)
$ to get:
$\\$
$x_{1t} =\mu+\beta_{2}\left\{ x_{0}+\sum_{i=0}^{t}\varepsilon_{2i}\right\} +\varepsilon_{1t}
x_{1t} =\mu+\beta_{2}x_{0}+\beta_{2}\sum_{i=0}^{t}\varepsilon_{2i}+\varepsilon_{1t}
$
$\\$
We see at the two series share a common stochastic trend. We can then define a cointegration vector $\beta=\left(1\;-\beta_{2}\right)\prime
$ such that:
$\\$
$u_{t}=\beta\prime x_{t}=\left(1\;-\beta_{2}\right)\left(\begin{array}{c}
\mu+\beta_{2}x_{0}+\beta_{2}\sum_{i=0}^{t}\varepsilon_{2i}+\varepsilon_{1t}\\
x_{0}+\sum_{i=0}^{t}\varepsilon_{2i}
\end{array}\right)
$
$\\$
$u_{t}=\beta\prime x_{t}=\mu+\beta_{2}x_{0}+\beta_{2}\sum_{i=0}^{t}\varepsilon_{2i}+\varepsilon_{1t}-\beta_{2} x_{0}-\beta_{2}\sum_{i=0}^{t}\varepsilon_{2i}
$
$\\$
$u_{t}=\beta\prime x_{t}=\mu+\varepsilon_{1t}
$
We see that by defining a correct cointegrating vector the two stochastic trends cancel and the relationship between them is stationary ($u_{t}=\beta\prime x_{t}\sim I\left(0\right)
$). If $x_{1t}
$ was $I\left(0\right)
$ then the stochastic trend in $x_{2t}
$ would not be deleted by defining a cointegrating relationship. So yes you need both your series to be $I\left(1\right)
$!
$\\$
$\\$
(3) The last question. Yes OLS is valid to use on the two stochastic series since it can be shown that the OLS estimator for the static regression (Eq. $\left(1\right)
$) will be super consistent (variance converges to zero at $T^{-2}
$) when both series are $I\left(1\right)
$ and when they cointegrate. So if you find cointegration and your series are $I\left(1\right)
$ your estimates will be super consistent. If you do not find cointegration then the static regression will not be consistent. For further readings see the seminal paper by Engle and Granger, 1987, Co-Integration, Error Correction: Representation, Estimation and Testing.
Here is an example of where three positive and one negative loading on the error correction term makes intuitive sense.
Consider a four-variable cointegrated system $(x_t, y_t, z_t, w_t)$ with $(x_t, y_t, z_t)$ being the three underlying stochastic trends and $w_t := x_t + y_t + z_t + \varepsilon_t$ where $\varepsilon_t$ is a stationary process.
Define the error correction term as $ect_t := w_t - x_t - y_t - z_t (=\varepsilon_t)$. This is obviously stationary as $\varepsilon_t$ is stationary.
Then it is natural to expect that the error correction term will have positive loadings in the equations for $\Delta x_t, \Delta y_t, \Delta z_t$ and a negative one in the equation for $\Delta w_t$, because:
- If $x_t$ deviates from the long run equilibrium by getting "too high", $ect_t$ will become negative, and then the positive loading on $ect_t$ will drag $x_{t+1}$ down, so back to equilibrium. The same holds for $y_t$ and $z_t$.
- If $w_t$ deviates from the long run equilibrium by getting "too high", $ect_t$ will become positive, and then the negative loading on $ect_t$ will drag $w_{t+1}$ down, so back to equilibrium.
- And the reverse for the cases of variables getting "too low".
Best Answer
I am not an expert of this topic, but here is what seems reasonable to me.
Case 1: all endogenous variables are stationary, one exogenous variable is integrated (has a unit root).
I would take first differences of the integrated exogenous variable and include that in the VAR model. You cannot include levels of an integrated variable because you would end up with a stationary variable on the left hand side and a non-stationary combination (made up of some stationary variables and one non-stationary variable) on the right hand side, which is a contradiction.
Case 2: all endogenous variables are stationary, some exogenous variables are integrated.
A: If the exogenous variables are not cointegrated, include their first differences just as in Case 1.
B: If the exogenous variables are cointegrated, include both their first differences and the stationary combinations (error correction terms).
In both A and B cases the left-hand-side variables will be the regular endogenous variables. My answer is about what to include on the right hand side.
Case 3: one endogenous variable is integrated, one exogenous variables is integrated.
A: If the two variables are not cointegrated, include their first differences just as in Case 2A.
B: If the two variables are cointegrated, include both their first differences and their stationary combination (the error correction term).
In both A and B cases the left-hand-side variables will be the regular endogenous variables except for the integrated one which is replaced by its own first differences. My answer is about what to include on the right hand side.