I have a doubt whether I am forecasting the volatility of the prices or the actual values of return?
The reference manual for the "fGarch" package tells on p. 30 that method predict
will give forecasts for both the conditional mean and the conditional variance. There will be different columns "meanForecast", "meanError", and "standardDeviation" in the function's output. I suppose the first one will contain the forecasts for the conditional mean, which you seem to be interested in.
Since I am not looking at options, there is no point forecasting the volatility right? Because it won't tell me whether prices will go up or down.
You may or may not be interested in forecasting the conditional variance. However, as long as the conditional variance process can be well approximated by some GARCH model, you should account for that. Ignoring the GARCH patterns and (silently) assuming a constant conditional variance will yield inferior forecasts for the conditional mean, because the misspecification of the conditional variance equation will negatively affect the estimation of the conditional mean model.
Thus if (1) you want to have a good forecast for the conditional mean
and (2) the conditional variance follows a GARCH process, you should keep the GARCH model.
Since I have an ARMA(0,1) for my model, my forecasts will always be constant and if I don't include a mean in the model then the forecasts are <...> 0.
Yes, they will be constant, but no, the $h$-step-ahead forecast (for $h \geqslant 1$) for the conditional mean is not zero. It rather is
$$\hat{x}_{t+h|t}=\hat{\theta}_1 \hat{\varepsilon}_t,$$
where $\hat{\theta}_1$ is the estimated MA(1) coefficient and $\hat{\varepsilon}_t$ is the estimated innovation at time $t$.
I have assumed away the potential presence of the mean component $\hat{\mu}$ for simplicity.
So is there a point of using those different models in this case?
Without a GARCH model your $h$-step-ahead forecast (for $h \geqslant 1$) will be
$$\hat{x}_{t+h|t}=\hat{\theta}_1 \hat{\varepsilon}_t$$
but with a GARCH model your $h$-step-ahead forecast will be a constant
$$\tilde{x}_{t+h|t}=\tilde{\theta}_1 \tilde{\varepsilon}_t.$$
Note that in general $\hat{\theta}_1 \neq \tilde{\theta}_1$ and $\hat{\varepsilon}_t \neq \tilde{\varepsilon}_t$. This is because the estimates of $\theta$ and $\varepsilon$ from the conditional mean model will not be the same under different specifications of the conditional variance model. Therefore, you will have different forecasts $\hat{x}_{t+h|t} \neq \tilde{x}_{t+h|t}$, and a correct specification of the conditional variance model matters.
Using a rolling window is a very typical approach. Conceptually, they estimate the model every day using the last 500 days, so when a day is over, the next day they will update all the estimates based on the new most recent 500-day window. So they will re-estimate the whole model using the previous 499 observations plus the new one (yesterday observation). This is what they do conceptually.
The reason why they do is simple: the problem with time series and all historical data in general is that when you use more data points (keeping the frequency constant), in general you have the benefit of more data. However this benefit may be partly offset by the fact that you will use old data, so you are exposed to the risk that something has structurally changed in the series and that behavior will not be recurring in the near future. Clearly there are ways to control for it, but in general this is a very common problem when you use historical data (the more data, the better, but the older data, the worse). At least this is something well known in financial literature. So the reason why they use a rolling window is to control for the problem of using data being too outdated and far in the past: indeed the window is shifted forward on each day, so that you will use the most recent and updated window of information for each estimate. the hope is that, within that window, no structural changes in the series have occurred (no change in the market behavior and functioning) and the recorded recent behavior will be comparable to that in the near future.
That is also why, typically, those rolling windows are used to make 1-step-ahead forecast (or at most a very-few-step-ahead forecasts).
Best Answer
A GARCH model is a special case of a GAS volatility model when the measurement density is normal. When the measurement density is non-normal, the corresponding score that drives the model will be different. For example, using a t-distribution leads to 'trimming' of heavy-tailed observations, whereas using a GED distribution leads to 'Winsorization'. The normal score - aka the GARCH score- reacts linearly with respect to the residuals so does not have a similar robustness property.