Strict stationarity is the strongest form of stationarity. It means that the joint statistical distribution of any collection of the time series variates never depends on time. So, the mean, variance and any moment of any variate is the same whichever variate you choose. However, for day to day use strict stationarity is too strict. Hence, the following weaker definition is often used instead. Stationarity of order 2 which includes a constant mean, a constant variance and an autocovariance that does not depend on time. (second-order stationary or stationary of order 2). A weaker form of stationarity that is first-order stationary which means that the mean is a constant function of time, time-varying means to obtain one which is first-order stationary.
Using traditional stationarity tests such us PP.test (Phillips-Perron Unit Root Test), kpss test or Augmented Dickey-Fuller Tests are not adequate if you are to perform regression via other methods than ARIMA (due that in Arima the orders are fixed and that no other factors that produce non stationarity are included in the model). For non Arima cases stationarity tests in the frequency domain are more adequate.
Tests in the frequency domain : The Priestley-Subba Rao (PSR) test for nonstationarity (fractal package). Based upon examining how homogeneous a set of spectral density function (SDF) estimates are across time, across frequency, or both.
The test you refer to is a test also in the frequency domain (which tests a second order unit root test) where the wavelet looks at a quantity called βj(t) which is closely related to a wavelet-based time-varying spectrum of the time series (it is a linear transform of the evolutionary wavelet spectrum of the locally stationary wavelet processes of Nason, von Sachs and Kroisandt, 2000). So we see if βj(t) function varies over time or is constant by looking at Haar wavelet coefficients of the estimate so is stationary if all haar coefficients are zero (locits package).
There are other concerns about stationarity such us long range dependence, fractional integrated processes (ARFIMA) where the term d (differenciation) refers to long term memory processes.
The effect of higher order non stationarity, long term dependencies is that they are in effect reflected systematically in the errors of a regression, however its impact and thus validity of the regression is difficult to measure
I have a doubt whether I am forecasting the volatility of the prices or the actual values of return?
The reference manual for the "fGarch" package tells on p. 30 that method predict
will give forecasts for both the conditional mean and the conditional variance. There will be different columns "meanForecast", "meanError", and "standardDeviation" in the function's output. I suppose the first one will contain the forecasts for the conditional mean, which you seem to be interested in.
Since I am not looking at options, there is no point forecasting the volatility right? Because it won't tell me whether prices will go up or down.
You may or may not be interested in forecasting the conditional variance. However, as long as the conditional variance process can be well approximated by some GARCH model, you should account for that. Ignoring the GARCH patterns and (silently) assuming a constant conditional variance will yield inferior forecasts for the conditional mean, because the misspecification of the conditional variance equation will negatively affect the estimation of the conditional mean model.
Thus if (1) you want to have a good forecast for the conditional mean
and (2) the conditional variance follows a GARCH process, you should keep the GARCH model.
Since I have an ARMA(0,1) for my model, my forecasts will always be constant and if I don't include a mean in the model then the forecasts are <...> 0.
Yes, they will be constant, but no, the $h$-step-ahead forecast (for $h \geqslant 1$) for the conditional mean is not zero. It rather is
$$\hat{x}_{t+h|t}=\hat{\theta}_1 \hat{\varepsilon}_t,$$
where $\hat{\theta}_1$ is the estimated MA(1) coefficient and $\hat{\varepsilon}_t$ is the estimated innovation at time $t$.
I have assumed away the potential presence of the mean component $\hat{\mu}$ for simplicity.
So is there a point of using those different models in this case?
Without a GARCH model your $h$-step-ahead forecast (for $h \geqslant 1$) will be
$$\hat{x}_{t+h|t}=\hat{\theta}_1 \hat{\varepsilon}_t$$
but with a GARCH model your $h$-step-ahead forecast will be a constant
$$\tilde{x}_{t+h|t}=\tilde{\theta}_1 \tilde{\varepsilon}_t.$$
Note that in general $\hat{\theta}_1 \neq \tilde{\theta}_1$ and $\hat{\varepsilon}_t \neq \tilde{\varepsilon}_t$. This is because the estimates of $\theta$ and $\varepsilon$ from the conditional mean model will not be the same under different specifications of the conditional variance model. Therefore, you will have different forecasts $\hat{x}_{t+h|t} \neq \tilde{x}_{t+h|t}$, and a correct specification of the conditional variance model matters.
Best Answer
As @Cagdas Ozgenc writes, the problem is that GARCH does not forecast future realizations (which you can observe), but future volatility (which you cannot observe). Thus, classical point forecast error (or accuracy) measures don't make sense.
So, how do we evaluate a GARCH volatility forecast? In fact, one usually not only forecasts volatility using GARCH, but adds distributional assumptions (typically a normal or a t distribution) and outputs a density forecast. The question now becomes how to evaluate a density forecast.
The classical way of evaluating a density forecast is to calculate its Probability Integral Transform, plot a histogram and check whether the PIT is uniformly distributed. Diebold, Gunther & Tay (1998, International Economic Review) is the classical reference - note that they give a very nice example using t-GARCH processes. Tay & Wallis (2000, Journal of Forecasting) is a somewhat newer overview.
However, recent research has focused on the shortcomings of the PIT. It turns out that systematically wrong forecasts can still give uniform histograms. Gneiting, Balabdaoui & Raftery (2007, JRSS B) give some disconcerting examples and propose scoring rules as a remedy. These are less intuitive than the PIT, but they simultaneously evaluate calibration and sharpness of predictive distributions. Gneiting & Katzfuss (2014, Annual Review of Statistics and Its Application) give a more up-to-date overview of density forecasting and evaluation.