My understanding is that AIC, DIC, and WAIC are all estimating the same thing: the expected out-of-sample deviance associated with a model. This is also the same thing that cross-validation estimates. In Gelman et al. (2013), they say this explicitly:

A natural way to estimate out-of-sample prediction error is cross-validation (see Vehtari and Lampinen, 2002, for a Bayesian perspective), but researchers have always sought alternative mea- sures, as cross-validation requires repeated model fits and can run into trouble with sparse data. For practical reasons alone, there remains a place for simple bias corrections such as AIC (Akaike, 1973), DIC (Spiegelhalter, Best, Carlin, and van der Linde, 2002, van der Linde, 2005), and, more recently, WAIC (Watanabe, 2010), and all these can be viewed as approximations to different versions of cross-validation (Stone, 1977).

BIC estimates something different, which is related to minimum description length. Gelman et al. say:

BIC and its variants diﬀer from the other information criteria considered here in being motivated not by an estimation of predictive ﬁt but by the goal of approximating the marginal probability density of the data, p(y), under the model, which can be used to estimate relative posterior probabilities in a setting of discrete model comparison.

I don't know anything about the other information criteria you listed, unfortunately.

**Can you use the AIC-like information criteria interchangeably?** Opinions may differ, but given that AIC, DIC, WAIC, and cross-validation all estimate the same thing, then yes, they're more-or-less interchangeable. BIC is different, as noted above. I don't know about the others.

**Why have more than one?**

**AIC** works well when you have a maximum likelihood estimate and flat priors, but doesn't really have anything to say about other scenarios. The penalty is also too small when the number of parameters approaches the number of data points. **AICc** over-corrects for this, which can be good or bad depending on your perspective.

**DIC** uses a smaller penalty if parts of the model are heavily constrained by priors (e.g. in some multi-level models where variance components are estimated). This is good, since heavily constrained parameters don't really constitute a full degree of freedom. Unfortunately, the formulas usually used for DIC assume that the posterior is essentially Gaussian (i.e. that it is well-described by its mean), and so one can get strange results (e.g. negative penalties) in some situations.

**WAIC** uses the whole posterior density more effectively than DIC does, so Gelman et al. prefer it although it can be a pain to calculate in some cases.

**Cross-validation** does not rely on any particular formula, but it can be computationally prohibitive for many models.

In my view the decision about which one of the AIC-like criteria to use depends entirely on these sorts of practical issues, rather than a mathematical proof that one will do better than the other.

**References**:

Gelman et al. Understanding predictive information criteria for Bayesian models. Available from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.295.3501&rep=rep1&type=pdf

Firstly 50 lags is too much. What kind of data are you modeling?
Secondly, there is a problem with your code: the D.X must start at 0 not 1. And You wrote L(1).X twice.

You can use this

forval i=1/50{

forval j=1/50{

regress D.Y L(1/`i').D.Y L(0/`

j').D.X L(1).Y L(1).X

estimates store est_`i'_`

j'

}

}

estimates stats est_*,n(251)

## Best Answer

$AIC$ for model $i$ of an

a priorimodel set can be recaled to $\mathsf{\Delta}_i=AIC_i-minAIC$ where the best model of the model set will have $\mathsf{\Delta}=0$. We can use the $\mathsf{\Delta}_i$ values to estimate strength of evidence ($w_i$) for the all models in the model set where: $$ w_i = \frac{e^{(-0.5\mathsf{\Delta}_i)}}{\sum_{r=1}^Re^{(-0.5\mathsf{\Delta}_i)}}. $$ This is often refered to as the "weight of evidence" for model $i$ given thea priorimodel set. As $\mathsf{\Delta}_i$ increases, $w_i$ decreases suggesting model $i$ is less plausible. These $w_i$ values can be interpreted as the probability that model $i$ is the best model given thea priorimodel set. We could also calculate the relative likelihood of model $i$ versus model $j$ as $w_i/w_j$. For example, if $w_i = 0.8$ and $w_j = 0.1$ then we could say model $i$ is 8 times more likely than model $j$.Note, $w_1/w_2 = e^{0.5\Delta_2}$ when model 1 is the best model (smallest $AIC$). Burnham and Anderson (2002) term this as the evidence ratio. This table shows how the evidence ratio changes with respect to the best model.

ReferenceBurnham, K. P., and D. R. Anderson. 2002. Model selection and multimodel inference: a practical information-theoretic approach. Second edition. Springer, New York, USA.

Anderson, D. R. 2008. Model based inference in the life sciences: a primer on evidence. Springer, New York, USA.