I am looking for some clarification regarding multivariable cointegration and what steps I should take to avoid spurious regressions.
I am analysing a time series $y$ as a function of independent variables $\{a,b,c,d,e,f\}$ etc.
Let's say, for example, that I test all the series for stationarity. Importantly,
- $y$ and $d$, $e$, $f$ are I(1) (nonstationary),
- $a$, $b$, $c$ are I(0) (stationary).
I know that I should therefore avoid spurious regressions by testing for cointegration between the variables. Given this set up, my questions are:
-
If I find that $d$, $e$ and $f$ are cointegrated with each other by a linear combination, should I combine them into a new stationary variable? Can I use this variable in my regression analysis? Would I lose any long term information doing this? What is the best way to deal with this scenario?
-
What if $y$ (the dependent outcome variable) is cointegrated with any of $d$, $e$, $f$? What does this mean and how should I proceed with the regression analysis?
-
What if there are multiple cointegrated relationships between multiple variables? $(d,f)$ is cointegrated, and $(e,f)$? What is the best way to deal with this?
Basically, I want to make sure that my correlations between $y$ and $\{a,b,c,d,e,f\}$ are not spurious. I am a bit confused about the implications of cointegration when there are multiple variables.
Best Answer
Essentially, you want to avoid having integrated variables (unless in stationary combinations, i.e. cointegrating vectors) in your models. The principle to follow is to use differenced variables as regressands as well as regressors if the original ones are I(1); and also use cointegrating vectors as regressors in presence of cointegration.