You touch upon two main issues: estimation and model selection.
Estimation
For a given model specification, you may
- Either write down the likelihood function and feed it into a generic optimizer (such as the function
optim
in R);
- Or use an existing function that takes the model specification (e.g. ARMA(p,q)-GARCH(s,r)), "writes the likelihood" for you and optimizes it (such as the function
ugarchfit
in the "rugarch" package in R).
Both ways are fine:
- The first one allows you to dig deep into model fitting and perhaps alter model specification beyond what is available in the standard functions; however, it is time consuming and the way you are fitting might not be optimized for speed (while the standard functions typically are), unless you are a knowledgeable programmer with experience in optimization;
- The second one allows for a convenient and fast estimation without the need to understand how exactly it is done (the latter can also be a drawback depending on what you wish to achieve).
The likelihood functions of ARMA and GARCH are available in time series textbooks such as Hamilton "Time Series Analysis" (2004) or Tsay "Analysis of Financial Time Series" (2005), among other. Hopefully, the likelihood of ARMA-GARCH is also available somewhere, but I do not have a refence handy. You could try textbooks, lecture notes or maybe software documentation (R, Stata, Matlab). (Please post any references here in the comments, I will appreciate it.)
Estimation can also be done
- either simultaneously (maximizing the likelihood function that reflects the dynamics of both the conditional mean and the conditional variance)
- or in two steps (first estimate the conditional mean equation, implicitly assuming constant conditional variance, then estimate the conditional variance equation for a given estimated conditional mean).
The problem with two-step estimation is that the first step uses an assumption that is violated in the second step and as such makes the estimators of both steps inefficient and sometimes inconsistent, as discussed in earlier posts (cited e.g. in the OP).
Model selection
Selecting a conditional mean-conditional variance model is not easy because it is a large animal and there are so many choices to make. Selecting sequantially (first the conditional mean model, then the conditional variance model) is suboptimal because the first step depends on assumptions that are violated in the second step, and I am not aware of any theoretical results that guarantee consistent selection or the like, as also discussed in previous posts. Nevertheless, this is sometimes done in practice and even recommended in time series textbooks such as Tsay "Analysis of Financial Time Series" (2005). However, I perceive the recommendation as "a" model selection strategy that is relatively easy, but not necessarily "the best" one.
Among other strategies probably the simplest, although computationally demanding one would be to fix a pool of candidate models (e.g. all submodels of ARMA(4,4)-GARCH(2,2)), estimate them (preferably simultaneously) and select the one with the lowest AIC (if the goal is forecasting) or BIC (if the goal is recovering the "true" model).
Questions 1, 2 and 3
- This should be clear from the text above.
- It is possible, but that does not mean it is desirable. As I mentioned, I am not aware of any theoretical results that guarantee consistent selection in the two-step approach. On the other hand, sometimes it makes sense to go for speed and convenience instead of a statistically "clean" approach, and this is what people often do.
- The model estimated on data until time $t$ will immediately yield a forecast for time $t+1$, because the left hand side of the model leads the right hand side by one time period. If you want to forecast further ahead, do that iteratively: consider the $h-1$-step-ahead forecast as real data and this way "extend your sample" by $h-1$ time points, then do exactly the same as when forecasting 1 step ahead (do not refit the model).
Obtaining accurate point forecasts for financial time series is notoriously hard. That has to do with the nature of the financial markets; actors look for opportunities to exploit any predictability, and they remove it while they are doing it (change in expected profitability of an asset $\rightarrow$ change in supply/demand $\rightarrow$ change in asset price). This happens very quickly in active markets, so that the predictability does not persist long enough for models to capture it. Thus financial time series often behave like random walks with some heteroskedasticity.
Meanwhile, volatility of financial time series is easier to predict as apparently there is no market mechanism to remove the predictability. That is why your GARCH forecasts of volatility seem to work rather well.
But you should note that graphs of fitted volatility vs. realized squared returns can be somewhat misleading. Leaving aside the fact that squared returns are only a noisy proxy of realized volatility, there is another thing: our eyes are easily tricked by graphs like the first one in your post. When it comes to goodness of fit, our eyes pay attention not only to to the vertical distance between fitted and realized values, which is what matters, but also to horizontal distance. So instead of graphing fitted. vs. realized values (two lines), it is helpful to graph just the difference between the two (one line). You might be surprised to discover that differences which appear small from the first type of graph are not that small when depicted in the second type of graph. This is because the fitted volatility from GARCH typically lags the realized volatility by one period. If a time series has many data points, this lag (the horizontal difference) is visually small and can hardly be noticed, so the fit looks better than it actually is.
Best Answer
ugarchspec
, you can do this indirectly by differencing your data a desired number of times before feeding into estimation viaugarchfit
. So if the desired model for seriesx
is ARIMA$(p,d,q)$, then specify ARMA$(p,q)$ inugarchspec
and feeddiff(x,d)
instead ofx
to the functionugarchfit
.Note that there does not seem to be an option to use SARMA models in the "rugarch" package, so you will have to let the "S" part go. But if there is a seasonal pattern (and that is quite likely when it comes to tourist arrivals), you will have to account for it somehow. Consider using exogenous seasonal variables (dummies or Fourier terms) in the conditional mean model via the argument
external.regressors
inside the argumentmean.model
in functionugarchspec
. Alternatively, note that a SARMA model corresponds to a restricted ARMA model. An approximation of SARMA could thus be an ARMA with the appropriate lag order but without the SARMA-specific parameter restrictions (since those might not be available in "rugarch").Also consider whether a GARCH model for the conditional variance is relevant. I do not know the tourist business well but I am not immediately convinced that tourist arrivals will have a GARCH pattern. You could check for the need of GARCH-type of conditional variance model by testing the residuals from the SARIMA model using ARCH-LM test or some other test for (G)ARCH effects.