My understanding is that AIC, DIC, and WAIC are all estimating the same thing: the expected out-of-sample deviance associated with a model. This is also the same thing that cross-validation estimates. In Gelman et al. (2013), they say this explicitly:
A natural way to estimate out-of-sample prediction error is cross-validation (see Vehtari and Lampinen, 2002, for a Bayesian perspective), but researchers have always sought alternative mea- sures, as cross-validation requires repeated model fits and can run into trouble with sparse data. For practical reasons alone, there remains a place for simple bias corrections such as AIC (Akaike, 1973), DIC (Spiegelhalter, Best, Carlin, and van der Linde, 2002, van der Linde, 2005), and, more recently, WAIC (Watanabe, 2010), and all these can be viewed as approximations to different versions of cross-validation (Stone, 1977).
BIC estimates something different, which is related to minimum description length. Gelman et al. say:
BIC and its variants differ from the other information criteria considered here in being motivated not by an estimation of predictive fit but by the goal of approximating the marginal probability density of the data, p(y), under the model, which can be used to estimate relative posterior probabilities in a setting of discrete model comparison.
I don't know anything about the other information criteria you listed, unfortunately.
Can you use the AIC-like information criteria interchangeably? Opinions may differ, but given that AIC, DIC, WAIC, and cross-validation all estimate the same thing, then yes, they're more-or-less interchangeable. BIC is different, as noted above. I don't know about the others.
Why have more than one?
AIC works well when you have a maximum likelihood estimate and flat priors, but doesn't really have anything to say about other scenarios. The penalty is also too small when the number of parameters approaches the number of data points. AICc over-corrects for this, which can be good or bad depending on your perspective.
DIC uses a smaller penalty if parts of the model are heavily constrained by priors (e.g. in some multi-level models where variance components are estimated). This is good, since heavily constrained parameters don't really constitute a full degree of freedom. Unfortunately, the formulas usually used for DIC assume that the posterior is essentially Gaussian (i.e. that it is well-described by its mean), and so one can get strange results (e.g. negative penalties) in some situations.
WAIC uses the whole posterior density more effectively than DIC does, so Gelman et al. prefer it although it can be a pain to calculate in some cases.
Cross-validation does not rely on any particular formula, but it can be computationally prohibitive for many models.
In my view the decision about which one of the AIC-like criteria to use depends entirely on these sorts of practical issues, rather than a mathematical proof that one will do better than the other.
References:
Gelman et al. Understanding predictive information criteria for Bayesian models. Available from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.295.3501&rep=rep1&type=pdf
Firstly 50 lags is too much. What kind of data are you modeling?
Secondly, there is a problem with your code: the D.X must start at 0 not 1. And You wrote L(1).X twice.
You can use this
forval i=1/50{
forval j=1/50{
regress D.Y L(1/i').D.Y L(0/
j').D.X L(1).Y L(1).X
estimates store est_i'_
j'
}
}
estimates stats est_*,n(251)
Best Answer
$AIC$ for model $i$ of an a priori model set can be recaled to $\mathsf{\Delta}_i=AIC_i-minAIC$ where the best model of the model set will have $\mathsf{\Delta}=0$. We can use the $\mathsf{\Delta}_i$ values to estimate strength of evidence ($w_i$) for the all models in the model set where: $$ w_i = \frac{e^{(-0.5\mathsf{\Delta}_i)}}{\sum_{r=1}^Re^{(-0.5\mathsf{\Delta}_i)}}. $$ This is often refered to as the "weight of evidence" for model $i$ given the a priori model set. As $\mathsf{\Delta}_i$ increases, $w_i$ decreases suggesting model $i$ is less plausible. These $w_i$ values can be interpreted as the probability that model $i$ is the best model given the a priori model set. We could also calculate the relative likelihood of model $i$ versus model $j$ as $w_i/w_j$. For example, if $w_i = 0.8$ and $w_j = 0.1$ then we could say model $i$ is 8 times more likely than model $j$.
Note, $w_1/w_2 = e^{0.5\Delta_2}$ when model 1 is the best model (smallest $AIC$). Burnham and Anderson (2002) term this as the evidence ratio. This table shows how the evidence ratio changes with respect to the best model.
Reference
Burnham, K. P., and D. R. Anderson. 2002. Model selection and multimodel inference: a practical information-theoretic approach. Second edition. Springer, New York, USA.
Anderson, D. R. 2008. Model based inference in the life sciences: a primer on evidence. Springer, New York, USA.