Strict stationarity is the strongest form of stationarity. It means that the joint statistical distribution of any collection of the time series variates never depends on time. So, the mean, variance and any moment of any variate is the same whichever variate you choose. However, for day to day use strict stationarity is too strict. Hence, the following weaker definition is often used instead. Stationarity of order 2 which includes a constant mean, a constant variance and an autocovariance that does not depend on time. (second-order stationary or stationary of order 2). A weaker form of stationarity that is first-order stationary which means that the mean is a constant function of time, time-varying means to obtain one which is first-order stationary.
Using traditional stationarity tests such us PP.test (Phillips-Perron Unit Root Test), kpss test or Augmented Dickey-Fuller Tests are not adequate if you are to perform regression via other methods than ARIMA (due that in Arima the orders are fixed and that no other factors that produce non stationarity are included in the model). For non Arima cases stationarity tests in the frequency domain are more adequate.
Tests in the frequency domain : The Priestley-Subba Rao (PSR) test for nonstationarity (fractal package). Based upon examining how homogeneous a set of spectral density function (SDF) estimates are across time, across frequency, or both.
The test you refer to is a test also in the frequency domain (which tests a second order unit root test) where the wavelet looks at a quantity called βj(t) which is closely related to a wavelet-based time-varying spectrum of the time series (it is a linear transform of the evolutionary wavelet spectrum of the locally stationary wavelet processes of Nason, von Sachs and Kroisandt, 2000). So we see if βj(t) function varies over time or is constant by looking at Haar wavelet coefficients of the estimate so is stationary if all haar coefficients are zero (locits package).
There are other concerns about stationarity such us long range dependence, fractional integrated processes (ARFIMA) where the term d (differenciation) refers to long term memory processes.
The effect of higher order non stationarity, long term dependencies is that they are in effect reflected systematically in the errors of a regression, however its impact and thus validity of the regression is difficult to measure
How can i see the benefits in modeling and forecasting the variance when my go is modeling and forecasting the mean.
If the conditional variance is non-constant but rather of GARCH type, (implicitly) assuming the cond. variance to be constant will yield inefficient estimates of the coefficients in the cond. mean model. That is, if the true data generating process is better approximated by an AR-GARCH model than by an AR model, using an AR model without a GARCH model will yield inefficient estimates of the AR coefficients. When you use the cond. mean model to forecast the cond. mean some periods ahead, the inefficiently estimated model will produce poorer forecasts than an efficiently estimated model would.
For what purpose would I use my estimated variance and make your prediction?
Would it be just to build a confidence interval for the expected mean? In this case, obviously, the confidence interval would not be a straight line because the conditional variance is non-constant.
Essentially, you got the idea right. The confidence interval produced by a model with GARCH-type of cond. variance will be different than one produced by a model with constant cond. variance. But note that in any case the confidence interval will be expanding with time, so it will not be bordered by two straight horizontal lines even in the case of constant cond. variance.
Best Answer
My experiences with programming/implementing and testing ARCH/GARCH procedures have led me to the conclusion that they must be useful somewhere and someplace but I haven't seen it. Gaussian violations such as unusual values/level shifts/seasonal pulses and local time trends should be used initially to deal with changes in volatility/error variance as they have less serious side effects. After any of these adjustments care might be taken to validate that model parameters are constant over time . Furthermore error variance may not be constant but simpler/less intrusive remedies like Box-Cox and detecting deterministic break points in error variance ala Tsay are much more useful and less destructive. Finally if none of these procedures work then my last gasp would be to throw ARCH/GARCH at the data and then add a ton of holy water. I firmly agree with your findings and conclude that these are methods looking for data or just dissertation topics flying in the wind.