Strict stationarity is the strongest form of stationarity. It means that the joint statistical distribution of any collection of the time series variates never depends on time. So, the mean, variance and any moment of any variate is the same whichever variate you choose. However, for day to day use strict stationarity is too strict. Hence, the following weaker definition is often used instead. Stationarity of order 2 which includes a constant mean, a constant variance and an autocovariance that does not depend on time. (second-order stationary or stationary of order 2). A weaker form of stationarity that is first-order stationary which means that the mean is a constant function of time, time-varying means to obtain one which is first-order stationary.
Using traditional stationarity tests such us PP.test (Phillips-Perron Unit Root Test), kpss test or Augmented Dickey-Fuller Tests are not adequate if you are to perform regression via other methods than ARIMA (due that in Arima the orders are fixed and that no other factors that produce non stationarity are included in the model). For non Arima cases stationarity tests in the frequency domain are more adequate.
Tests in the frequency domain : The Priestley-Subba Rao (PSR) test for nonstationarity (fractal package). Based upon examining how homogeneous a set of spectral density function (SDF) estimates are across time, across frequency, or both.
The test you refer to is a test also in the frequency domain (which tests a second order unit root test) where the wavelet looks at a quantity called βj(t) which is closely related to a wavelet-based time-varying spectrum of the time series (it is a linear transform of the evolutionary wavelet spectrum of the locally stationary wavelet processes of Nason, von Sachs and Kroisandt, 2000). So we see if βj(t) function varies over time or is constant by looking at Haar wavelet coefficients of the estimate so is stationary if all haar coefficients are zero (locits package).
There are other concerns about stationarity such us long range dependence, fractional integrated processes (ARFIMA) where the term d (differenciation) refers to long term memory processes.
The effect of higher order non stationarity, long term dependencies is that they are in effect reflected systematically in the errors of a regression, however its impact and thus validity of the regression is difficult to measure
No, the requirement on the covariance function is not enough to ensure weak stationarity. The following is a counterexample to such a claim:
Let $\xi_t $ be iid random variables with zero mean and unit variance. Next, let $X_t=\xi_t + 1$ for $t$ even and $X_t=\xi_t$ for $t$ odd. Consider the process $\{X_t\}_{t=0}^\infty$. Then $Cov(X_t,X_s)=0$ if $t\neq s$ and 1 otherwise. So the covariance function satisfies the condition for weak stationarity. The mean does not, however, since $\mathbb EX_t=1$ when $t$ even and 0 otherwise.
If you want an even simpler example, take $X_0=\xi_0+1$ and $X_t=\xi_t$ for all other $t$.
Best Answer
The hard part of the question is the second part about exhibiting a non-stationary, weakly stationary series with identical distributions at all times.
I hope you won't mind a simplification: rather than using Poisson distributions, which can take on infinitely many values, let's use one that takes on only a small number of values. A particularly simple distribution $F$ describes the difference between the number of heads and number of tails in two independent throws of a fair coin. (This makes it a version of a Binomial distribution.) It therefore has a chance of $1/2$ of being $0$ and a $1/4$ chance each of being $-2$ and $2$. You can readily compute that its expectation is $0$ and its variance is $2$.
I need to describe two different bivariate distributions, $G$ and $H$, whose marginals both equal $F$.
$G$ is the joint distribution of two independent random variables $X,Y$ distributed according to $F$. It assigns positive probabilities to all nine distinct outcomes $(X,Y)$: when both of $|X|$ and $|Y|$ are $2$, the chance is $1/16$; the chance of $(0,0)$ is $1/4$; and the other four chances are $1/8$.
$H$ assigns positive probabilities only to five of these outcomes: when both of $|X|$ and $|Y|$ are $2$, the chance is $1/8$, and the chance of $(0,0)$ is $1/2$.
(One way to realize $H$ in a simulation is to draw $X$ from $F$, and then negate $X$ with probability $1/2$ to produce $Y$.)
It is immediate that the covariance of $X$ and $Y$ under the law $G$ is zero, because $X$ and $Y$ are independent. Under the law $H$, the covariance also is zero, because (since the expectations are zero), $$\operatorname{Cov}_H(X,Y)=\mathbb{E}_H(XY) - \mathbb{E}_H(X)\mathbb{E}_H(Y) = \mathbb{E}_H(XY)$$
and a direct calculation from the definition of expectation gives $$\mathbb{E}_H(XY) = \frac{1}{8}(2\times 2 + 2\times (-2) + (-2)\times 2 + (-2)\times (-2)) + \frac{1}{2}(0\times 0) = 0.$$
We have established that $G$ and $H$ have the same first and second bivariate moments and identical marginal distributions. They provide a nice example of a weakly stationary time series $(Z_t),\ t\in\mathbb{Z},$ that is not stationary. Simply let $(X_t),\ t\in\mathbb{Z}\setminus\{0,1\}$ be a sequence of independent random variables with distribution $F$ and let $(X_0,X_1)$ be independent of the other $X_t$ and have joint distribution $H$. Thus, all bivariate distributions of $(X_t, X_s)$ follow the law $G$ except when $\{t,s\}=\{0,1\}$, which follow the law $H$. Since $H$ differs from $G$, this makes the time series non-stationary. (The illustrations at the end help you see this.)
Nevertheless, all the univariate distributions are the same and all the covariances between $X_t$ and $X_s$ for $t\ne s$ are zero, as we have seen. That makes the series weakly stationary.
One way to illustrate the situation is to show some partial realizations of this process. What is particularly revealing are the first differences of the time series, $dX_t = X_{t+1} - X_t$. These will be values in the set $\{-4,-2,0,2,4\}$. The construction of $H$ precludes the values $\pm 2$ from ever occurring for $dX_0 = X_1 - X_0$, but those values can occur for all other differences. In the graphics below I have therefore plotted these first differences against time $t=-15, -14, \ldots, 15$. The value $dX_0$ is highlighted in red. The "odd" values $\pm 2$ are plotted with crosses and the other values with dots.
It helps to see many realizations at once. In the next graphic I overlaid 300 of them on the same axes.
The pattern becomes clear: at time $t=0$, there are never any differences of $2$ or $-2$ plotted, but such differences show up at all other times. That's a visible demonstration of non-stationarity of the differences. Since the differences are not stationary, the original series cannot be either.
The
R
code to produce these simulations is particularly simple, so it might be useful to see it.