Strict stationarity is the strongest form of stationarity. It means that the joint statistical distribution of any collection of the time series variates never depends on time. So, the mean, variance and any moment of any variate is the same whichever variate you choose. However, for day to day use strict stationarity is too strict. Hence, the following weaker definition is often used instead. Stationarity of order 2 which includes a constant mean, a constant variance and an autocovariance that does not depend on time. (second-order stationary or stationary of order 2). A weaker form of stationarity that is first-order stationary which means that the mean is a constant function of time, time-varying means to obtain one which is first-order stationary.
Using traditional stationarity tests such us PP.test (Phillips-Perron Unit Root Test), kpss test or Augmented Dickey-Fuller Tests are not adequate if you are to perform regression via other methods than ARIMA (due that in Arima the orders are fixed and that no other factors that produce non stationarity are included in the model). For non Arima cases stationarity tests in the frequency domain are more adequate.
Tests in the frequency domain : The Priestley-Subba Rao (PSR) test for nonstationarity (fractal package). Based upon examining how homogeneous a set of spectral density function (SDF) estimates are across time, across frequency, or both.
The test you refer to is a test also in the frequency domain (which tests a second order unit root test) where the wavelet looks at a quantity called βj(t) which is closely related to a wavelet-based time-varying spectrum of the time series (it is a linear transform of the evolutionary wavelet spectrum of the locally stationary wavelet processes of Nason, von Sachs and Kroisandt, 2000). So we see if βj(t) function varies over time or is constant by looking at Haar wavelet coefficients of the estimate so is stationary if all haar coefficients are zero (locits package).
There are other concerns about stationarity such us long range dependence, fractional integrated processes (ARFIMA) where the term d (differenciation) refers to long term memory processes.
The effect of higher order non stationarity, long term dependencies is that they are in effect reflected systematically in the errors of a regression, however its impact and thus validity of the regression is difficult to measure
Do you by any chance "predict" raw residuals rather than standardized residuals? (Then you would get exactly the same result of the ARCH-LM test when "pre-testing" and when "post-testing" in the case of no conditional mean model $r_t=\epsilon_t$.) The raw residuals will contain ARCH effects and that is why you want to apply a GARCH model and obtain standardized residuals that do not contain ARCH effects.
On the other hand, if you find ARCH effects in the standardized residuals, the GARCH model you are using may be inappropriate. Try different variants of the GARCH model (EGARCH, APARCH and whatever else) and different lag orders.
Also note that the original ARCH-LM test is inappropriate for testing for remaining ARCH effects in the standardized residuals of a GARCH model; Li-Mak test should be used instead. (I do not know whether the Li-Mak test is available in Stata. Also, use of the original ARCH-LM test seems to be relatively widespread in the applied literature, even though it is inappropriate.)
References:
- Li, W. K. and Mak, T. K. (1994) On the squared residual autocorrelations in non-linear time series with conditional heteroscedasticity. Journal of Time Series Analysis 15, 627–36.
Best Answer
Copying from the abstract of Engle's original paper:
"These are mean zero, serially uncorrelated processes with nonconstant variances conditional on the past, but constant unconditional variances. For such processes, the recent past gives information about the one-period forecast variance".
Continuing with the references, as the author who introduced GARCH shows (Bollerslev, Tim (1986). "Generalized Autoregressive Conditional Heteroskedasticity", Journal of Econometrics, 31:307-327) for the GARCH(1,1) process, it suffices that $\alpha_1 + \beta_1 <1$ for 2nd-order stationarity.
Stationarity (the one needed for estimation procedures), is defined relative to the unconditional distribution and moments.
ADDENDUM
To summarize here discussion in the comments, the GARCH modeling approach is an ingenious way to model suspected heteroskedasticity over time, i.e. of some form of heterogeneity of the process (which would render the process non-stationary) as an observed feature that comes from the existence of memory of the process, in essence inducing stationarity at the unconditional level.
In other words, we took our two "great opponents" in stochastic process analysis (heterogeneity and memory), and used the one to neutralize the other -and this is indeed an inspired strategy.