First, you should decide on using a univariate or a multivariate model. It seems reasonable to think that oil price and unemployment are causal for the air travel demand and not the other way around. Thus, in line with one of the answers to this post, you may address your study in a univariate setting. If the previous assumption is not appropriate, then you may take a multivariate approach, for example a VAR model, as mentioned by @Miha Trošt.
In the univariate setting, you can consider the following models:
- ARIMAX models: these are ARIMA models as the model that you selected which include exogenous regressors.
- Distributed lag models: these models are based on a regression equation that includes lagged versions of the explanatory variables.
- Autoregressive distributed lag models, as the previous model but including also as regressors the lags of the dependent variable.
Did you check whether the regular and seasonal differencing filters applied by the airlines model are necessary? You mention that the series is measured in rates, this may already render the series stationary. This is not something that must necessarily meet, I didn't see the data, so this is just a guess.
You should be also concerned with the correlation among the regressors. Oil price and unemployment may be correlated. If correlation exists and is high, estimates of the parameters may not be accurate. If correlation is high, you may include only one of the regressors. There are some techniques to deal with multicollinearity but with only two regressors it is probably not worth complicating too much the analysis and it will probably be safe to keep both variables, unless they are highly correlated.
First of all consider two time series, $x_{1t}
$ and $x_{2t}
$ which both are $I\left(1\right)
$, i.e. both series contain a unit root. If these two series cointegrate then there will exist coefficients, $\mu
$ and $\beta_{2}
$ such that:
$\\$
$x_{1t}=\mu+\beta_{2}x_{2t}+u_{t}\quad\left(1\right)
$
$\\$
will define an equilibrium. In order to test for cointegration using the Engle-Granger 2-step approach we would
$\\$
1) Test the series, $x{}_{1t}
$ and $x_{2t}
$ for unit roots. If both are $I\left(1\right)
$ then proceed to step 2).
$\\$
2) Run the above defined regression equation and save the residuals. I define a new “error correction” term, $\hat{u}_{t}=\hat{ecm}_{t}
$.
$\\$
3) Test the residuals ($\hat{ecm}_{t}
$) for a unit root. Note that this test is the same as a test for no-cointegration since under the null-hypothesis the residuals are not stationary. If however there is cointegration than the residuals should be stationary. Remember that the distribution for the residual based ADF-test is not the same as the usual DF-distributions and will depend on the amount of estimated parameters in the static regression above since additiona variables in the static regression will shift the DF-distributions to the left. The 5% critical values for one estimated parameter in the static regression with a constant and trend are -3.34 and -3.78 respectively.
$\\$
4) If you reject the null of a unit root in the residuals (null of no-cointegration) then you cannot reject that the two variables cointegrate.
$\\$
5) If you want to set up an error-correction model and investigate the long-run relationship between the two series I would recommend you to rather set up an ADL or ECM model instead since there is a small sample bias attached to the Engle-Granger static regression and we cannot say anything about significance of the estimated parameters in the static regression since the distribution depends upon unknown parameters.To answer your questions:1) As seen above you method is correct. I just wanted to point out that the residual based tests critical values are not the same as the usual ADF-test critical values.
$\\$
$\\$
(2) If one of the series is stationary i.e. $I\left(0\right)
$ and the other one is $I\left(1\right)
$ they cannot be cointegrated since the cointegration implies that they share common stochastic trends and that a linear relationship between them is stationary since the stochastic trends will cancel and thereby producing a stationary relationship. To see this consider the two equations:
$\\$
$x_{1t}=\mu+\beta_{2}x_{2t}+\varepsilon_{1t}\quad\left(2\right)$
$\Delta x_{2t}=\varepsilon_{2t}\quad\left(3\right)
$
Note that $\varepsilon_{2t}\sim i.i.d.
$, $x_{1t}\sim I\left(1\right)
$, $x_{2t}\sim I\left(1\right)
$, $u_{t}=\beta\prime x_{t}\sim I\left(0\right)
$, $\varepsilon_{1t}\sim i.i.d.
$
$\\$
First we solve for equation $\left(3\right)
$ and get
$\\$
$x_{2t}=x_{0}+\sum_{i=0}^{t}\varepsilon_{2i}
$
$\\$
Plug this solution into equation $\left(2\right)
$ to get:
$\\$
$x_{1t} =\mu+\beta_{2}\left\{ x_{0}+\sum_{i=0}^{t}\varepsilon_{2i}\right\} +\varepsilon_{1t}
x_{1t} =\mu+\beta_{2}x_{0}+\beta_{2}\sum_{i=0}^{t}\varepsilon_{2i}+\varepsilon_{1t}
$
$\\$
We see at the two series share a common stochastic trend. We can then define a cointegration vector $\beta=\left(1\;-\beta_{2}\right)\prime
$ such that:
$\\$
$u_{t}=\beta\prime x_{t}=\left(1\;-\beta_{2}\right)\left(\begin{array}{c}
\mu+\beta_{2}x_{0}+\beta_{2}\sum_{i=0}^{t}\varepsilon_{2i}+\varepsilon_{1t}\\
x_{0}+\sum_{i=0}^{t}\varepsilon_{2i}
\end{array}\right)
$
$\\$
$u_{t}=\beta\prime x_{t}=\mu+\beta_{2}x_{0}+\beta_{2}\sum_{i=0}^{t}\varepsilon_{2i}+\varepsilon_{1t}-\beta_{2} x_{0}-\beta_{2}\sum_{i=0}^{t}\varepsilon_{2i}
$
$\\$
$u_{t}=\beta\prime x_{t}=\mu+\varepsilon_{1t}
$
We see that by defining a correct cointegrating vector the two stochastic trends cancel and the relationship between them is stationary ($u_{t}=\beta\prime x_{t}\sim I\left(0\right)
$). If $x_{1t}
$ was $I\left(0\right)
$ then the stochastic trend in $x_{2t}
$ would not be deleted by defining a cointegrating relationship. So yes you need both your series to be $I\left(1\right)
$!
$\\$
$\\$
(3) The last question. Yes OLS is valid to use on the two stochastic series since it can be shown that the OLS estimator for the static regression (Eq. $\left(1\right)
$) will be super consistent (variance converges to zero at $T^{-2}
$) when both series are $I\left(1\right)
$ and when they cointegrate. So if you find cointegration and your series are $I\left(1\right)
$ your estimates will be super consistent. If you do not find cointegration then the static regression will not be consistent. For further readings see the seminal paper by Engle and Granger, 1987, Co-Integration, Error Correction: Representation, Estimation and Testing.
Best Answer
It sounds like you want to fit an ARIMAX model to your time series. I would try to fit an ADL (auto-regressive distributed lag) model, an ECM (error correction model) or apply the Engle-Granger 2-step analysis to the series to see if your series cointegrate and to estimate the long-run relationship between them in case they do. If they do not cointegrate then continue with the ARIMAX model or estimate stationary ADL or ECM models. Note that an ADL model and the ARIMAX model are very similar. Although cointegration analysis with several variables is quite an endeavour and fills up entire text books (see e.g. Katarina Juselius' “The Cointegrated VAR Model: Methodology and Application) cointegration analysis with only two variables is quite fast and easy depending on what approach you want to use. Note that a part of my answer is the same as I answered in another question on a similar question. I will outline the steps you should follow in order to model the time series appropriately.Remember firstly that there are different kinds of non-stationarity and different ways on how to deal with them. Four common ones are:
1) Deterministic trends or trend stationarity. If your series is of this kind de-trend it or include a time trend in the regression/model. You might want to check out the Frisch–Waugh–Lovell theorem on this one.
2) Level shifts and structural breaks. If this is the case you should include a dummy variable for each break or if your sample is long enough model each regimé separately.
3) Changing variance. Either model the samples separately or model the changing variance using the ARCH or GARCH modelling class.
4) If your series contain a unit root. In general you should then check for cointegrating relationships between the variables but since you are concerned with univariate forecasting you shoud difference it once or twice depending on the order of integration.The steps to model the series:
1) Look at the ACF and PACF together with a time series plot to get an indication on wheter or not the series is stationary or non-stationary. If the ACF decays very slowly and the TS plot looks like it exhibiting a unit root (not mean reverting) then this is a good indication that the series do not include a unit root.
2) Test the series for a unit root. This can be done with a wide range of tests, some of the most common being the ADF test, the Phillips-Perron (PP) test, the KPSS test which has the null of stationarity or the DF-GLS test which is the most efficient of the aforementioned tests. NOTE! That in case your series contain a structural break these tests are biased towards not rejecting the null of a unit root. In case you want to test the robustness of these tests and if you suspect one or more structural breaks you should use endogenous structural break tests. Two common ones are the Zivot-Andrews test which allows for one endogenous structural break and the Clemente-Montañés-Reyes which allows for two structural breaks. The latter allows for two different models. An additive outlier model which accounts for sudden changes in the slope of the series and an innovative outlier model which takes gradual changes into account and allows a break in the intercept and slope. Look these tests up on Wikipedia or in some econometrics text book. Some statistical packages have these tests built in which makes conducting a battery of unit root test on your series very easy.
In case your series contain a unit root then test the first differences of your series in orer to see if they contain a second unit root.
3) In case your series are non-stationary then you should:
Note that you could use the Johansen cointegration test or some other tests but for simplicity these are left out and in your case where you only have two time series either one of A), B) and C) will suffice. Note that although the Engle-Granger procedure is easier to apply (at least I think so) the ADL/ECM estimators are prefferable as can be seen by conducting a Monte Carlo simulation.
I will not explain all these approaches and how to derive the long-run solution as that would take a considerately amount of time and space but here is an excellent link in order to introduce these methods:
http://www.econ.ku.dk/metrics/Econometrics2_07_I/LectureNotes/Cointegration.pdf
4) The amount of lags you include should be picked so that you eliminate all residual autocorrelation when picking lags for your ADL model.
5) After your cointegration analysis you are more or less done. Please note that in case you want to expand your model to several variables you should use the CVAR model and the analysis gets a lot more complicated as mentioned above.
6) In case your variables do not cointegrate but contain a unit root then continue with your ARIMAX modelling
7) In case of stationary variables estimate a stationary ADL/ECM model for your data or proceed with your ARIMAX. Same steps as in 6). An excellent introductionary note on the stationary models can be found here: http://www.econ.ku.dk/metrics/Econometrics2_07_I/LectureNotes/dynamicmodels.pdfIn case your series contain a unit root with a drift or no unit root but a deterministic trend you can add a time trend to your specification. Further, check the first differences of the series and the time series plots to see if your series contain a structural break and/or outliers and include dummy variables for these. Note that you should test for structural breaks, see point 2) above. Another alternative is the Chow test. Thirdly it could be an idea to take natural logs of your variables as this will stabilize the variance of the series. The log transformation will not change anything as its a monotonic transformation.
Hopefully this made some sense. Please note that this was a very short introduction and that this could easily fill several chapters in a textbook. I will strongly recommend to read those two lectur notes I posted links to or that you get hold of a textbook on time series analysis/econometrics. If you need help to understand some of the concepts better then please feel free to ask! Model specifications and examples are all included in the lecture notes I linked to.