In short, you should select models using AIC and/or out-of-sample fit criteria and view the rejected hypothesis as a suggestion to consider other types of models.
When using this class of time series models researchers are usually interested in accurate prediction\forecasting. Since AIC measures how well a model predicts the data in-sample, it operates as a fair means of model selection in this case (you may also want to test how well the models fit out-of-sample…more on that below).
However, just because a particular model has the lowest AIC does not mean that that model is correctly specified or that it approximates the true data generating process well. It could be that all the models you proposed were poor choices, or that the true process FTSE follows is so complex that practically every reasonable model will be rejected given enough data. AIC provides no information on this point which is where hypothesis testing can come in.
Under the assumptions of standard ARMA-GARCH, the residuals should be homoscedastic and more generally iid normal. Your hypothesis test suggests that your residuals are not homoscedastic and, in turn, that your ARMA-GARCH model may be miss specified. On this note you may want to consider alternative specifications for the volatility process including other variants of GARCH models, i.e. EGARCH, GJR-GARCH, TGARCH, AVGARCH, NGARCH, GARCH-M, etc. and/or stochastic volatility models. It is highly likely that one of these models will offer a lower AIC value and produce residuals which cannot be rejected for homoscedasticity.
One important thing to note though is that no model will be perfect, especially for something like the FTSE 100. The true data generating process driving a large financial index like this is impossibly complex, so pretty much every model you propose will be false. For this reason, it can be argued that any meaningful hypothesis you do not reject is a reflection of insufficient data or lack of statistical power rather than evidence supporting one model over others.
One way to partially resolve this dilemma is to use out-of-sample fit as opposed to or in conjunction with AIC. A simple example would be to fit the model using only the first 80% or 90% of the data and using the resulting coefficient estimates to obtain a log-likelihood for the remaining 20%-10% portion of the data. The model with the highest log-likelihood would be preferred. If the ARMA-GARCH model is truly misspecified in a way that impairs its forecasting performance, then an out-of-sample fit will help expose it.
Do you by any chance "predict" raw residuals rather than standardized residuals? (Then you would get exactly the same result of the ARCH-LM test when "pre-testing" and when "post-testing" in the case of no conditional mean model $r_t=\epsilon_t$.) The raw residuals will contain ARCH effects and that is why you want to apply a GARCH model and obtain standardized residuals that do not contain ARCH effects.
On the other hand, if you find ARCH effects in the standardized residuals, the GARCH model you are using may be inappropriate. Try different variants of the GARCH model (EGARCH, APARCH and whatever else) and different lag orders.
Also note that the original ARCH-LM test is inappropriate for testing for remaining ARCH effects in the standardized residuals of a GARCH model; Li-Mak test should be used instead. (I do not know whether the Li-Mak test is available in Stata. Also, use of the original ARCH-LM test seems to be relatively widespread in the applied literature, even though it is inappropriate.)
References:
- Li, W. K. and Mak, T. K. (1994) On the squared residual autocorrelations in non-linear time series with conditional heteroscedasticity. Journal of Time Series Analysis 15, 627–36.
Best Answer
On the one hand, GARCH is a model for the conditional distribution of the time series $y_t$:
\begin{aligned} y_t &\sim d(\mu_t,\sigma_t^2), \\ \mu_t &= \dots \text{(e.g. some ARMA equation)} \\ \sigma_t^2 &= \omega + \alpha_1 ( y_{t-1} - \mu_{t-1} )^2 + \dotsc + \alpha_s ( y_{t-s} - \mu_{t-s} )^2 + \beta_1 \sigma_{t-1}^2 + \dotsc + \beta_r \sigma_{t-r}^2, \\ \varepsilon_t &:= \frac{y_t-\mu_t}{\sigma_t} \sim i.i.d(0,1). \\ \end{aligned}
GARCH specifically characterizes the conditional variance equation, but this inevitably depends on some equation for the conditional mean; otherwise the conditional variance would be undefined. This hopefully answers your questions 1. and 3.
On the other hand, GARCH happens to characterize the conditional variance of the residuals from the conditional mean model, $u_t$, as well. The exact same model as above can be represented as follows:
\begin{aligned} y_t &= \mu_t + u_t, \\ \mu_t &= \dots \text{(e.g. some ARMA equation)} \\ u_t &= \sigma_t \varepsilon_t, \\ \sigma_t^2 &= \omega + \alpha_1 u_{t-1}^2 + \dotsc + \alpha_s u_{t-s}^2 + \beta_1 \sigma_{t-1}^2 + \dotsc + \beta_r \sigma_{t-r}^2, \\ \varepsilon_t &\sim i.i.d(0,1), \\ \end{aligned}
and $\sigma_t^2$ is the conditional variance of the residual $u_t$ since $\text{Var}(u_t|I_{t-1})=\sigma_t^2$ where $I_{t-1}$ is information available at time $t-1$. This hopefully answers your question 2.