I suggest you should determine both the ARMA and the GARCH parts simultaneously. If you determine the ARMA part first by temporarily ignoring GARCH, this will lead to inconsistent ARMA parameter estimates (unless the MA part is missing) and probably suboptimal selection of autoregressive and moving average lag orders -- because ACF and PACF confidence bounds will be invalid given the neglected GARCH-type residuals. Also, the Ljung-Box test will not have the regular null distribution under GARCH-type residuals, thus you cannot rely on it for testing how well the ARMA model captures the patterns in the data.
These issues have been discussed in earlier posts here, here and to some extent also here.
In short, you should select models using AIC and/or out-of-sample fit criteria and view the rejected hypothesis as a suggestion to consider other types of models.
When using this class of time series models researchers are usually interested in accurate prediction\forecasting. Since AIC measures how well a model predicts the data in-sample, it operates as a fair means of model selection in this case (you may also want to test how well the models fit out-of-sampleā¦more on that below).
However, just because a particular model has the lowest AIC does not mean that that model is correctly specified or that it approximates the true data generating process well. It could be that all the models you proposed were poor choices, or that the true process FTSE follows is so complex that practically every reasonable model will be rejected given enough data. AIC provides no information on this point which is where hypothesis testing can come in.
Under the assumptions of standard ARMA-GARCH, the residuals should be homoscedastic and more generally iid normal. Your hypothesis test suggests that your residuals are not homoscedastic and, in turn, that your ARMA-GARCH model may be miss specified. On this note you may want to consider alternative specifications for the volatility process including other variants of GARCH models, i.e. EGARCH, GJR-GARCH, TGARCH, AVGARCH, NGARCH, GARCH-M, etc. and/or stochastic volatility models. It is highly likely that one of these models will offer a lower AIC value and produce residuals which cannot be rejected for homoscedasticity.
One important thing to note though is that no model will be perfect, especially for something like the FTSE 100. The true data generating process driving a large financial index like this is impossibly complex, so pretty much every model you propose will be false. For this reason, it can be argued that any meaningful hypothesis you do not reject is a reflection of insufficient data or lack of statistical power rather than evidence supporting one model over others.
One way to partially resolve this dilemma is to use out-of-sample fit as opposed to or in conjunction with AIC. A simple example would be to fit the model using only the first 80% or 90% of the data and using the resulting coefficient estimates to obtain a log-likelihood for the remaining 20%-10% portion of the data. The model with the highest log-likelihood would be preferred. If the ARMA-GARCH model is truly misspecified in a way that impairs its forecasting performance, then an out-of-sample fit will help expose it.
Best Answer
You touch upon two main issues: estimation and model selection.
Estimation
For a given model specification, you may
optim
in R);ugarchfit
in the "rugarch" package in R).Both ways are fine:
The likelihood functions of ARMA and GARCH are available in time series textbooks such as Hamilton "Time Series Analysis" (2004) or Tsay "Analysis of Financial Time Series" (2005), among other. Hopefully, the likelihood of ARMA-GARCH is also available somewhere, but I do not have a refence handy. You could try textbooks, lecture notes or maybe software documentation (R, Stata, Matlab). (Please post any references here in the comments, I will appreciate it.)
Estimation can also be done
The problem with two-step estimation is that the first step uses an assumption that is violated in the second step and as such makes the estimators of both steps inefficient and sometimes inconsistent, as discussed in earlier posts (cited e.g. in the OP).
Model selection
Selecting a conditional mean-conditional variance model is not easy because it is a large animal and there are so many choices to make. Selecting sequantially (first the conditional mean model, then the conditional variance model) is suboptimal because the first step depends on assumptions that are violated in the second step, and I am not aware of any theoretical results that guarantee consistent selection or the like, as also discussed in previous posts. Nevertheless, this is sometimes done in practice and even recommended in time series textbooks such as Tsay "Analysis of Financial Time Series" (2005). However, I perceive the recommendation as "a" model selection strategy that is relatively easy, but not necessarily "the best" one.
Among other strategies probably the simplest, although computationally demanding one would be to fix a pool of candidate models (e.g. all submodels of ARMA(4,4)-GARCH(2,2)), estimate them (preferably simultaneously) and select the one with the lowest AIC (if the goal is forecasting) or BIC (if the goal is recovering the "true" model).
Questions 1, 2 and 3