Solved – Are there circumstances in which BIC is useful and AIC is not

aicbicinformation theorymodel selectionprior

In the Wikipedia entry for Akaike information criterion, we read under Comparison with BIC (Bayesian information criterion) that

…AIC/AICc has theoretical advantages over BIC…AIC/AICc is derived from principles of information; BIC is not…BIC has a prior of 1/R (where R is the number of candidate models), which is "not sensible"…AICc tends to have practical/performance advantages over BIC…AIC is asymptotically optimal…BIC is not asymptotically optimal…the rate at which AIC converges to the optimum is…the best possible.

In the AIC talk section, there are numerous comments about the biased presentation of the comparison with BIC section. One frustrated contributor protested that the whole article "reads like a commercial for cigarettes."

In other sources, for example in this thesis appendix, the tenor of the claims for AIC seem more realistic. Thus, as a service to the community, we ask:

Q: Are there circumstances in which BIC is useful and AIC is not?

Best Answer

According to Wikipedia, the AIC can be written as follows: $$ 2k - 2 \ln(\mathcal L) $$ As the BIC allows a large penalization for complex models there are situations in which the AIC will hint that you should select a model that is too complex, while the BIC is still useful. The BIC can be written as follows: $$ -2 \ln(\mathcal L) + k \ln(n) $$ So the difference is that the BIC penalizes for the size of the sample. If you do not want to penalize for the sample there

A quick explanation by Rob Hyndman can be found here: Is there any reason to prefer the AIC or BIC over the other? He writes:

  • AIC is best for prediction as it is asymptotically equivalent to cross-validation.
  • BIC is best for explanation as it allows consistent estimation of the underlying data generating process.**

Edit: One example can be found in Time Series analysis. In VAR models the AIC (as well as its corrected version the AICc) often take to many lags. Therefore one should primarily look at the BIC when choosing the number of lags of a VAR Modell. For further information you can read chapter 9.2 from Forecasting- Principles and Practice by Rob J. Hyndman and George Athana­sopou­los.

Related Question