My understanding is that AIC, DIC, and WAIC are all estimating the same thing: the expected out-of-sample deviance associated with a model. This is also the same thing that cross-validation estimates. In Gelman et al. (2013), they say this explicitly:
A natural way to estimate out-of-sample prediction error is cross-validation (see Vehtari and Lampinen, 2002, for a Bayesian perspective), but researchers have always sought alternative mea- sures, as cross-validation requires repeated model fits and can run into trouble with sparse data. For practical reasons alone, there remains a place for simple bias corrections such as AIC (Akaike, 1973), DIC (Spiegelhalter, Best, Carlin, and van der Linde, 2002, van der Linde, 2005), and, more recently, WAIC (Watanabe, 2010), and all these can be viewed as approximations to different versions of cross-validation (Stone, 1977).
BIC estimates something different, which is related to minimum description length. Gelman et al. say:
BIC and its variants differ from the other information criteria considered here in being motivated not by an estimation of predictive fit but by the goal of approximating the marginal probability density of the data, p(y), under the model, which can be used to estimate relative posterior probabilities in a setting of discrete model comparison.
I don't know anything about the other information criteria you listed, unfortunately.
Can you use the AIC-like information criteria interchangeably? Opinions may differ, but given that AIC, DIC, WAIC, and cross-validation all estimate the same thing, then yes, they're more-or-less interchangeable. BIC is different, as noted above. I don't know about the others.
Why have more than one?
AIC works well when you have a maximum likelihood estimate and flat priors, but doesn't really have anything to say about other scenarios. The penalty is also too small when the number of parameters approaches the number of data points. AICc over-corrects for this, which can be good or bad depending on your perspective.
DIC uses a smaller penalty if parts of the model are heavily constrained by priors (e.g. in some multi-level models where variance components are estimated). This is good, since heavily constrained parameters don't really constitute a full degree of freedom. Unfortunately, the formulas usually used for DIC assume that the posterior is essentially Gaussian (i.e. that it is well-described by its mean), and so one can get strange results (e.g. negative penalties) in some situations.
WAIC uses the whole posterior density more effectively than DIC does, so Gelman et al. prefer it although it can be a pain to calculate in some cases.
Cross-validation does not rely on any particular formula, but it can be computationally prohibitive for many models.
In my view the decision about which one of the AIC-like criteria to use depends entirely on these sorts of practical issues, rather than a mathematical proof that one will do better than the other.
References:
Gelman et al. Understanding predictive information criteria for Bayesian models. Available from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.295.3501&rep=rep1&type=pdf
Best Answer
What alternatives do we have in model selection for prediction?
Why are the latter attractive in the time series setting?
*Information criteria have an asymptotic justification, so their use is not unproblematic in small samples. Nevetherless, a more efficient use of the data is more desirable than a less efficient use. By using the entire sample for estimation you are closer to asymptotics than by using, say, 2/3 of the sample.