Solved – Selecting an appropriate VAR model

forecastingmodel selectiontime seriesvector-autoregression

I would like to receive critical comments on an idea explained below.

Suppose I have variables $x_1$ through $x_K$, and this is a time series setting.

My aim is forecasting.

I know that all the variables are closely related, so there is strong endogeneity in the system. My idea is to use a vector autoregression (VAR). I have no clue what the lag order should be (theory does not say much about it). Hence, I plan to estimate a number of candidate models with different lag orders and then choose the most appropriate one.

There are $2^{K^2 \cdot P}$ sub-models for a nesting model of lag order $P$, which amounts to estimating $K \cdot 2^{K^2 \cdot P}$ equations in the first step of the feasible generalized least squares (FGLS) procedure and $2^{K^2 \cdot P}$ runs of the second step of FGLS.

This can get quite large, e.g. $\#models>10^6$ for $K=5$ and $P=4$, corresponding to $N>5 \cdot 10^6$ OLS regressions just for the first step of FGLS. Thus it would be nice to have some other approach that would not require estimating all possible sub-models but would still discover the best model or one close to that. Fortunately, there is LASSO and other techniques to do that.

I have an alternative idea (I am not claiming the authorship as other people may have thought of it before).

My idea:

  1. Take the first equation of the nesting VAR($P$) model and consider it as a separate model for the moment
  2. Estimate by OLS all models nested in it; there will be $2^{K \cdot P}$ such models
  3. Do the same for the remaining $K-1$ equations
  4. Obtain all possible combinations of equation 1 through equation K; these will form restricted VAR models estimated inefficiently by OLS; there will be $(2^{K \cdot P})^K=2^{K^2 \cdot P}$ of them
  5. For each combination, i.e. each restricted VAR model, obtain the AIC value
  6. Pick the ultimate restricted VAR model by AIC
  7. Estimate the ultimate model efficiently by FGLS, and then use it for forecasting

The strength of the idea: number of equations to be estimated is $K \cdot 2^{K \cdot P}$ (except for the estimation in 7.) instead of $K \cdot 2^{K^2 \cdot P}$ which is $2^K$ times smaller (e.g. 32 times smaller for $K=5$). This only counts the first stage of FGLS, but it gives the idea of the time to be saved.

The weakness of the idea: equation-by-equation OLS will not be efficient as compared with FGLS that could be used instead.

Questions:

  1. Does the idea seem reasonable?
  2. Is there a similar but better method?
  3. Any critical comments

Best Answer

  1. Not really, because overfitting will likely be a major problem even with $K$ and $P$ being rather small.
    Take $K=P=3$. There will be $\left(^{KP}_{\text{round}(KP/2)} \right)=\left(^{3\cdot 3}_{\text{round}(3\cdot 3/2)}\right)=\left(^9_5\right)=126$ different models with 5 nonzero coefficients in each equation. For models with the same number of nonzero coefficients, AIC-based selection will be nothing more than likelihood-based selection, which is prone to selecting an overfitted model. Increase $K$ and/or $P$ just a little, and you will get exorbitant numbers in place of 126.
    See more about the winner's curse when selecting from a large pool of models using information criteria in Hansen "A winner’s curse for econometric models: on the joint distribution of in-sample fit and out-of-sample fit and its implications for model selection" (2010).
  2. Regularization (just as you mention), (frequentist) model averaging and/or forecast averaging could be more effective approaches to forecasting. Bayesian alternatives could be BVAR modelling and also Bayesian model averaging.
  3. The (negative) effect of overfitting due to the proposed model selection scheme may considerably dominate the loss/gain in efficiency due to the use of OLS versus FGLS.
Related Question