So you have three nonstationary series and one stationary series. Let us call them $x_1$, $x_2$, $x_3$, and $x_4$, respectively. Suppose the nonstationarity of $x_1$, $x_2$, $x_3$ is of a unit-root kind (rather than of some other kind); that is, each of $x_1$, $x_2$, $x_3$ is integrated of order one, I(1). You can determine the order of integration using, for example, the augmented Dickey-Fuller test (ADF test).
Test each pair of the nonstationary series ($x_1$ and $x_2$; $x_1$ and $x_3$; $x_2$ and $x_3$) for cointegration using the Johansen or the Engle-Granger test.
Then test all three series ($x_1$, $x_2$, $x_3$) for cointegration using the Johansen test.
Depending on the results of the tests, you may find yourself in one of the following situations:
(A) No cointegration
(B) Two of the variables (say, $x_1$ and $x_2$) are cointegrated while the third variable (say, $x_3$) is not
(C) The three variables ($x_1$, $x_2$, $x_3$) are cointegrated
In general, you want the following:
- Models for cointegrated variables should have an error-correction representation; otherwise the model would be misspecified (cointegration goes hand-in-hand with the error correction representation).
- Models for stationary dependent variables should not have nonstationary explanatory variables (except perhaps for stationary combinations of cointegrated nonstationary variables); otherwise the linear combination of the regressors would diverge from the regressand.
- Models for nonstationary dependent variables should have at least one nonstationary explanatory variable; otherwise the regressand would diverge from the linear combination of the regressors. Mind nonstandard distributions of estimators for the integrated variables.
Based on these principles, you may do the following:
If (A) then first-difference each of the three variables ($x_1$, $x_2$, $x_3$), and use them together with the stationary variable $x_4$ to build a VAR model.
If (B) then build a model where
- $\Delta x_1$ depends on the error correction term and lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$;
- $\Delta x_2$ depends on the error correction term and lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$;
- $\Delta x_3$ depends on lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$;
- $x_4$ depends on the error correction term and lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$.
If (C) then build a model where
- $\Delta x_1$ depends on the error correction term and lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$;
- $\Delta x_2$ depends on the error correction term and lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$;
- $\Delta x_3$ depends on the error correction term and lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$;
- $x_4$ depends on the error correction term and lags of $\Delta x_1$, $\Delta x_2$, $\Delta x_3$, $x_4$.
These are pretty general models with lots of regressors. You may find it beneficial to exclude some variables from some equations or use penalization to avoid overfitting.
Question 1:
No, it is not strictly necessary to use AIC or BIC, but you need to have an objective method to assess how good your model is. People usually think that AIC and BIC are pre-estimation statistics, but when you run a VAR selection function, what your software is doing is estimating many VAR models and evaluating the likelihood function to compute the criteria. So you may think of AIC and BIC as two ways of assessing how good the models are. They are just telling you that the best model is, e.g., a VAR(2).
When you say that the 12 lags model produce great results, you are saying that based on what criteria/statistics?
If you are saying that because your residuals are not correlated, then you are on the right track, but maybe there is a simpler model that also produces white noise residuals. AIC and BIC may help you to find that model.
If you are saying that because this is the model that yields the best prediction for a particular test data, then you are on the right track also, but then probably you are using a prediction evaluation criteria such as MSE, MAE, etc.
If you are saying that because this is the model that makes your theoretical hypothesis valid, then you are doing bad science. This is not an objective method to evaluate your model. For casual purposes you usually need to do a lot of robustness checks such as varying the lag order to see if significance of coefficients change.
Question 2:
This may not always be a valid approach. Again, you need to evaluate your model using an objective criteria. When using this approach, if your residuals are not correlated and your model is parsimonious compared to other alternatives, then ok, but in many cases this will not be true.
Best Answer
That recommendation (1 for annual, 4 for quartely, etc) is simply a rule of thumb that was obtained by other people who probably used some information criteria so that now you don't have to. That is, if your time series is much longer or shorter than a typical case, you should determine the optimal lag yourself, otherwise relying on the rule of thumb is ok.