Solved – GARCH diagnostics: autocorrelation in standardized residuals but not in their squares

diagnosticgarchresiduals

Fitting an ARMA-GARCH model, I checked the Weighted Ljung-Box test on standardized residuals and squared residuals to verify if the model is adeguate in describing the linear dependence in the return and volatility series.
Combining different orders of the ARCH and GARCH part, for example a GARCH(1,1), GARCH(2,1), GARCH(2,2),

I always get that $p$-value of the test is below 0.05 for the standardized residuals and above 0.05 for the squared standardized residuals. So that seems contrasting to me, and I don't know what type of conclusion I can make. Given the results of the test for the squared standardized residuals, I would say that the model fits well the data, but the test on the standardized residuals suggests me the opposite.

What should I do? Can I priviledge the test result on the squared standardized residuals? Should I try with higher order of the model?

In all the attempts mentioned in the post, changing only the orders of the GARCH model, I always kept fixed the ARMA model. I just tried to change the ARMA order and it looks better. The best choise seems an GARCH(2,2) without the ARMA part. This is happening assuming that innovations follow a Skew Student-$t$ distribution.

Using just a GARCH model without the mean specification seems better in terms of the Ljung-Box test on residuals, and a GARCH(1,1) model fits well the data. At the same time, adding a mean specification improves the AIC and BIC values but requests me to use a GARCH model of higher order. What should I prefer between the two specification?

Best Answer

So that seems contrasting to me, and I don't know what type of conclusion I can make. Given the results of the test for the Square standarized residuals I would say that the model fits well the data, but the test on the standarized residuals suggest me the opposite.

You are testing two different hypotheses that are not closely related. The Ljung-Box test on (levels of) standardized residuals evaluates the dependence of the first moments with a time lag. The Ljung-Box test on squares of standardized residuals and the ARCH-LM test (on levels of standardized residuals) evaluate the dependence of the second moments with a time lag. You should not be confused that one is rejected while another is not. For example, would you get confused if you could not reject non-autocorrelation but rejected normality? Perhaps not, because these are two different things. The same applies to your case.

The findings by the Ljung-Box test on squares of standardized residuals and the ARCH-LM test are conflicting, which is unpleasant but can happen. Perhaps the dependence is borderline-strong (borderline-weak) so that one test finds it significant while the other finds it not significant.

(Also note that the tests may not be applicable to standardized residuals from a GARCH model due to nonstandard null distributions and the resulting $p$-values; ARCH-LM test is not applicable for sure and Li-Mak test should be used instead; both Ljung-Box tests may or may not be applicable, their validity does not seem trivial to me.)

Can I priviledge the test result on the squared standarized residuals?

No, you can't, because the two tests are addressing different issues.

Should I try with higher order of the model?

You may change the lag order (either ARMA or GARCH or both) or the error distribution, or even the model (try another flavour of GARCH). It is hard to tell which one is causing the trouble as their effects interact.

Using just a GARCH model without the mean specification seems better in terms of the Ljung-Box test on residuals, and a GARCH(1,1) model fits well the data. At the same time adding a mean specification improve the AIC, BIC criterion but requests me to use a GARCH model of higher order. What should I prefer between the two specification?

If you goal is to build a model for forecasting, follow AIC. Specification tests may indicate that the model is not perfect but perhaps the imperfection is small relative to the gain in forecasting accuracy due to using that particular model relative to its competitors.