Solved – Autocorrelation of VAR residuals

autocorrelationvector-autoregression

I am fitting a VAR model on 50+ timeseries that both have two variables, x and y. I am trying to identify if my bivariate VAR model has sufficient amount of lags. AIC nad SBIC both suggest using 2 lags. However, when I run Box-Pierce test on the residuals of VAR, the null hypothesis of white noise is rejected in over 50% of the cases (at 5% level of significance). What is even more interesting to me, however, is that as I tried estimating my model with more lags, Box-Pierce Q test rejects even more observations.

I have never ran across a similar problem, and thus I am at loss of how should I proceed. As further background information, both timeseries are over 10,000 periods long. All timeseries are stationary according to ADF, but they have large outliers.

Best Answer

AIC and BIC do not target minimizing the amount of autocorrelation in model residuals. Therefore, it is generally not surprising that a model selected by AIC or BIC has some autocorrelation.

The idea behind AIC and BIC is to select a model that describes the data quite well but not "too well", given the limited sample size. Trying to remove all autocorrelations may seem desirable, but if that increases model complexity to a large extent, then we may be overfitting. That is, in-sample results produced by a complicated model may look nice but once we collect more data, we will see that the model as estimated in the initial sample does not fit the new data well. In other words, the model does not generalize well. And we want it to generalize well because we are generally interested in the properties of the population and/or the yet unobserved samples from it. What AIC and BIC do is prevent overfitting and select a "golden middle" (in a certain sense, which differs for AIC and BIC).

Regarding the series being very long (over 10,000 obs.), you will get statistically significant results even when the effect size is very small. Which brings us to the question, are the autocorrelations economically significant? Or are they really tiny from an economic perspective but still statistically significant due to the large sample?

Regarding outliers, something should better be done about them, as otherwise they will be negatively affecting the modelling results. How to handle outliers is a whole separate topic, so I will not attempt to elaborate on that here; but I suggest taking some measures to deal with the outliers instead of neglecting them.

Related Question