Strict stationarity is the strongest form of stationarity. It means that the joint statistical distribution of any collection of the time series variates never depends on time. So, the mean, variance and any moment of any variate is the same whichever variate you choose. However, for day to day use strict stationarity is too strict. Hence, the following weaker definition is often used instead. Stationarity of order 2 which includes a constant mean, a constant variance and an autocovariance that does not depend on time. (second-order stationary or stationary of order 2). A weaker form of stationarity that is first-order stationary which means that the mean is a constant function of time, time-varying means to obtain one which is first-order stationary.
Using traditional stationarity tests such us PP.test (Phillips-Perron Unit Root Test), kpss test or Augmented Dickey-Fuller Tests are not adequate if you are to perform regression via other methods than ARIMA (due that in Arima the orders are fixed and that no other factors that produce non stationarity are included in the model). For non Arima cases stationarity tests in the frequency domain are more adequate.
Tests in the frequency domain : The Priestley-Subba Rao (PSR) test for nonstationarity (fractal package). Based upon examining how homogeneous a set of spectral density function (SDF) estimates are across time, across frequency, or both.
The test you refer to is a test also in the frequency domain (which tests a second order unit root test) where the wavelet looks at a quantity called βj(t) which is closely related to a wavelet-based time-varying spectrum of the time series (it is a linear transform of the evolutionary wavelet spectrum of the locally stationary wavelet processes of Nason, von Sachs and Kroisandt, 2000). So we see if βj(t) function varies over time or is constant by looking at Haar wavelet coefficients of the estimate so is stationary if all haar coefficients are zero (locits package).
There are other concerns about stationarity such us long range dependence, fractional integrated processes (ARFIMA) where the term d (differenciation) refers to long term memory processes.
The effect of higher order non stationarity, long term dependencies is that they are in effect reflected systematically in the errors of a regression, however its impact and thus validity of the regression is difficult to measure
My goal is simply to ... find statistically significant predictive results. Also, is there a particular market you would look at (energy, rates, equities)?
Most if not all the established and liquid financial markets will be very hard to predict whatever model you will use. If markets were relatively easy to predict, market participant would try to exploit that to make money. By doing that they would eliminate the predictability. This brings a contradiction, and thus the markets are not easy to predict.
Lastly, is GARCH only used for forecasting volatility? The professor I mentioned seemed to suggest I should turn toward GARCH or ARIMA-GARCH models to model stock returns. I read some papers that seemed to imply it could also be used for actual returns...
GARCH model is used for modelling the conditional variance of the disturbance term of the conditional mean model for a dependent variable $y_t$. E.g. if you have a conditional mean model $y_t=\alpha+\epsilon_t$, the GARCH model will describe the conditional variance of $\epsilon_t$. Sometimes the conditional mean model is "empty" ($y_t=\epsilon_t$), then GARCH model is used to model the conditional variance of $y_t$ itself.
Even if you are primarily interested in the conditional mean model (e.g. you want to predict stock returns using an ARMA model), a GARCH model combined with a model for the conditional mean can be useful. If the conditional variance of the dependent variable is time-varying, that should be accounted for, and a GARCH model does exactly that. If a time-varying conditional variance is neglected, the conditional mean model may (and likely will) be invalid.
Would the AR and MA components in an ARIMA-GARCH model differ from those in an ARMA model?
Yes. That also illustrates my last remark above.
From what I vaguely understood, ARIMA and GARCH are two completely separate things (with the former being used to predict the actual time series and the other to predict its volatility).
This is true. But as I have already explained, the two models can work together nicely.
Best Answer
A typical GARCH(r,s) model looks something like this: \begin{aligned} x_t &= \mu_t + u_t, \\ \mu_t &= \dots, \\ u_t &= \sigma_t \varepsilon_t, \\ \sigma_t^2 &= \omega + \alpha_1 u_{t-1}^2 + \dots + \alpha_s u_{t-s}^2 + \beta_1 \sigma_{t-1}^2 + \dots + \beta_r \sigma_{t-r}^2, \\ \varepsilon_t &\sim i.i.D(0,1), \end{aligned} where $D$ is some probability distribution with zero mean and unit variance.
GARCH models the entire conditional distribution of the dependent variable $x_t$, conditional on the history of $x_t$. Out of that you can derive a point forecast based on the loss function you are facing. E.g. if your loss is quadratic (a.k.a. square loss), the optimal point forecast is the conditional mean. Given an estimated GARCH model, the model's conditional mean equation $\mu_t =\dots$ will be sufficient for that.