Solved – Relationship between cointegrating relationship found via PCA and that found via a Johansen test

cointegrationpcastationaritytime series

Thanks to the explanation given here:

PCA on prices or returns

I understand how PCA can be used to derive cointegrating relationships.

However there is of course another well known method to identify cointegrating relationships: the Johansen test. I attach a matlab toolbox that provides this functionality.

http://www.spatial-econometrics.com/

However when I apply the same Rates data from the example shown here (this example shows how to derive cointegrating relationships using PCA):
http://www.mathworks.com/matlabcentral/fileexchange/24120-review-of-statistical-arbitrage–cointegration–and-multivariate-ornstein-uhlenbeck/content/MultivariateOUnCointegration/Empirical/S_StatArbSwaps.m

The resulting eignevectors/values from johansen() are quite different both in terms of scale and in terms of the patterns produced. The PCA based eigenvectors produces the well known level, slope, and curvature one would expect. The johansen eigenvectors don't seem to have such clear patterns at all?

However as both results represent the cointegrating relationship present int he data there must be a link between the two results?

1.) I'd like to understand the link between the PCA based eignevectors and those produced by the Johansen test?

Thanks

Best Answer

The Johansen and PCA estimators are quite different yet related. To be simple, let us assume that we have no short term lags. (If we have lags -as is the case in practice- the same reasoning applies however, by replacing the variables with their Frish Waugh residuals). We want to estimate $\beta$ in:

$ \Delta Y_t = \alpha \beta^{'} Y_{t-1}$

Let us define the second moments of these two variables $S_{00} = 1/T \sum \Delta Y_t \Delta Y_t^{'}, \quad S_{11} = 1/T \sum Y_{t-1} Y_{t-1}^{'}$, and $S_{01} = 1/t \sum \Delta Y_t Y_t{'}$. Note also that $S_{11}$ is the same if we use $Y_t$ instead of $Y_{t-1}$.

The estimators are:

  • Johansen: do a canonical correlation (CCA) between $\Delta y_t$ and $y_{t-1}$: user the r biggest eigenvectors of the (generalised) eigenvalue problem: $|\lambda S_{11} -S_{10}S_{00}^{-1}S_{01}|$
  • PCA: do a PCA on $y_t$, but use the r smallest eigenvectors of $|\lambda I -S_{00}|$

Intuition and link between the methods

The intuition behind the PCA estimator is that we are trying to find the components with the smallest variance (as OLS does in the Engle-Granger first step estimator), the first components (corresponding to the biggest eigenvalues) are actually the most non-stationary.

For the Johansen, we are trying to find a subset of the $Y_{T-1}$ that is the most correlated with (a subset of the) the $\Delta Y_t^{'}$.

Differences between the methods

Note that the problems are quite similar, since they both are special cases of the generalised eigenvalue problem. This implies that Johansen estimator looks at the eigenvalues of $S_{11}^{-1}S_{10}S_{00}^{-1}S_{01}$ while for PCA, it is the eigenvalue of $S_{00}$. As there is in general no link between the eigenvalues of AB and those of A and B, there is to my knowledge no link between the highest eigenvalue of CCA and the lowest of PCA (you could check this, as CCA and PCA are used in many other contexts). Note finally that there is also no directly link between the eigenvectors, since they are furthermore normalised differently:

  • For Johansen: $\beta_r' S_{00} \beta_r =1$, while
  • For PCA, Harris (1997) suggests setting $\beta' \beta =I$.

So how come these estimators are different yet are both consistent? Well, tough question, but note that we are not estimating the $\beta$ directly, but the space spanned by $\beta$, and the spaces estimated are given different basis.

References:

  • Harris, D. (1997) Principal components analysis of cointegrated time series, Econometric Theory, 13, 529–57.
  • Snell, A. (1999) Testing for r versus r-1 cointegrating vectors, Journal of Econometrics, 88, 151–91.
Related Question