I would like to receive critical comments on an idea explained below.
Suppose I have variables $x_1$ through $x_K$, and this is a time series setting.
My aim is forecasting.
I know that all the variables are closely related, so there is strong endogeneity in the system. My idea is to use a vector autoregression (VAR). I have no clue what the lag order should be (theory does not say much about it). Hence, I plan to estimate a number of candidate models with different lag orders and then choose the most appropriate one.
There are $2^{K^2 \cdot P}$ sub-models for a nesting model of lag order $P$, which amounts to estimating $K \cdot 2^{K^2 \cdot P}$ equations in the first step of the feasible generalized least squares (FGLS) procedure and $2^{K^2 \cdot P}$ runs of the second step of FGLS.
This can get quite large, e.g. $\#models>10^6$ for $K=5$ and $P=4$, corresponding to $N>5 \cdot 10^6$ OLS regressions just for the first step of FGLS. Thus it would be nice to have some other approach that would not require estimating all possible sub-models but would still discover the best model or one close to that. Fortunately, there is LASSO and other techniques to do that.
I have an alternative idea (I am not claiming the authorship as other people may have thought of it before).
My idea:
- Take the first equation of the nesting VAR($P$) model and consider it as a separate model for the moment
- Estimate by OLS all models nested in it; there will be $2^{K \cdot P}$ such models
- Do the same for the remaining $K-1$ equations
- Obtain all possible combinations of equation 1 through equation K; these will form restricted VAR models estimated inefficiently by OLS; there will be $(2^{K \cdot P})^K=2^{K^2 \cdot P}$ of them
- For each combination, i.e. each restricted VAR model, obtain the AIC value
- Pick the ultimate restricted VAR model by AIC
- Estimate the ultimate model efficiently by FGLS, and then use it for forecasting
The strength of the idea: number of equations to be estimated is $K \cdot 2^{K \cdot P}$ (except for the estimation in 7.) instead of $K \cdot 2^{K^2 \cdot P}$ which is $2^K$ times smaller (e.g. 32 times smaller for $K=5$). This only counts the first stage of FGLS, but it gives the idea of the time to be saved.
The weakness of the idea: equation-by-equation OLS will not be efficient as compared with FGLS that could be used instead.
Questions:
- Does the idea seem reasonable?
- Is there a similar but better method?
- Any critical comments
Best Answer
Take $K=P=3$. There will be $\left(^{KP}_{\text{round}(KP/2)} \right)=\left(^{3\cdot 3}_{\text{round}(3\cdot 3/2)}\right)=\left(^9_5\right)=126$ different models with 5 nonzero coefficients in each equation. For models with the same number of nonzero coefficients, AIC-based selection will be nothing more than likelihood-based selection, which is prone to selecting an overfitted model. Increase $K$ and/or $P$ just a little, and you will get exorbitant numbers in place of 126.
See more about the winner's curse when selecting from a large pool of models using information criteria in Hansen "A winner’s curse for econometric models: on the joint distribution of in-sample fit and out-of-sample fit and its implications for model selection" (2010).