Solved – Correct number of parameters of AR models for AIC / BIC

aicbic

I have a time series and want to use AIC / BIC to decide which of the following model is most appropriate:

  • A) AR(1), no constant with Gaussian innovation term
  • B) AR(2), no constant with Gaussian innovation term
  • C) AR(1), no constant with Student t innovation term
  • D) AR(2), no constant with Student t innovation term

What is the correct number of parameters to use in AIC / BIC for the models above?

I found in the Matlab doc the following explanation for an ARMA(p,q) model with Gaussian distribution: "Calculate the BIC for each fitted model. The number of parameters in a model is p + q + 1 (for the AR and MA coefficients, and constant term)."

What I do not understand is why there is no parameter added for the variance of the Gaussian distribution which is also estimated. In particular if the innovation term is Student-t distributed, I assume that the additional 'degree of freedom' parameter of the student t-distribution needs to be considered in AIC / BIC?

I intuitively would have chosen a number of parameters of 2 for A, 3 for B, 3 for C and 4 for D, but it may also be 1 for A, 2 for B, 2 for C and 3 for D if the variance is not counted as parameter (as in the Matlab example).

Best Answer

For AIC/BIC selection it doesn't really matter whether you choose to count the variance parameter, as long as you are consistent across models, because inference based on information-theoretic criteria only depends on the differences between the values for different models. Thus, adding 2 (for an additional parameter) across the board doesn't change any of the delta-*IC values, which is all you are using to choose models (or do model averaging, compute model weights, etc. etc.).

(However, you do have to be careful if you're going to compare models fitted with different procedures or different software packages, because they may count parameters in different ways.)

It does matter if you are going to use AICc or some other finite-size-corrected criterion, because then the residual information in the data set is used (the denominator of the correction term is $n-k-1$). Then the question you have to ask is whether a nuisance parameter such as the residual variance, which can be computed from the residuals without modifying the estimation procedure, should be included. I wrote in this r-sig-mixed-models post that I'm not sure about the right procedure here. However, looking quickly at Hurvich and Tsai's original paper (Hurvich, Clifford M., and Chih-Ling Tsai. 1989. “Regression and Time Series Model Selection in Small Samples.” Biometrika 76 (2) (June 1): 297–307, doi:10.1093/biomet/76.2.297, http://biomet.oxfordjournals.org/content/76/2/297.abstract), it does appear that they include the variance parameter, i.e. they use $k=m+1$ for a linear model with $m$ linear coefficients ...

I would further quote Press et al. (Numerical Recipes in C) that:

We might also comment that if the difference between $N$ and $N-1$ ever matters to you, then you are probably up to no good anyway - e.g. trying to substantiate a questionable hypothesis with marginal data.

(They are discussing the bias correction term in the sample variance calculation, but the principle applies here as well.)

Related Question