I suspect there is no general term that will cover all cases. Consider, for example, a white noise generator. In that case, we would just call it white noise. Now if the white noise comes from a natural source, e.g., AM radio band white noise, then it has effects including superimposed diurnal, seasonal, and sun-spot (11 year) solar variability, and man made primary and beat interference from radio broadcasts.
For example, the graph in the link mentioned by the OP looks like amplitude modulated white noise, almost like an earthquake. I personally would examine such a curve in the frequency and or phase domain, and describe it as an evolution of such in time because it would reveal a lot more about the signal structure by direct observation of how the amplitudes over a set of ranges of frequencies evolve in time with respect to detection limits as opposed to thinking about stationarity, mainly by reason of conceptual compactness. I understand the appeal of statistical testing. However, it would take umpteen tests and oodles of different criteria, as in the link, to incompletely describe an evolving frequency domain concept making the attempt at developing the concept of stationarity as a fundamental property seem rather confining. How does one go from that to Bode plotting, and phase plotting?
Having said that much, signal processing becomes more complicated when a "primary" violation of stationarity occurs; patient dies, signal stops, random walk continues, and so forth. Such processes are easier to describe as a non-stationarity than variously as an infinite sum of odd harmonics, or a decreasing to zero frequency. The OP complaint about not having much literature to document secondary stationarity is entirely reasonable; there does not seem to be complete agreement as to what even constitutes ordinary stationarity. For example, NIST claims that "A stationary process has the property that the mean, variance and autocorrelation structure do not change over time." Others on this site claim that "Autocorrelation doesn't cause non-stationarity," or using mixture distributions of RV's that "This process is clearly not stationary, but the autocorrelation is zero for all lags since the variables are independent." This is problematic because auto-non-correlation is typically "tacked-on" as an additional criterion of non-stationarity without much consideration given to how necessary and sufficient that is for defining a process. My advice on this would be first observe a process, and then to describe it, and to use phrases crouched in modifiers such as, "stationary/non-stationarity with respect to" as the alternative is to confuse many readers as to what is meant.
After further research, I've made some useful discoveries. The answer appears to be anything but straightforward. Let me start by answering my second question above: "What is the correct effective sample size (ESS) calculation (at least that which is used by effectiveSize
)?"
The answer is that the effectiveSize
function from the R coda
package uses the second definition I described in the question, namely
$$ESS = M \frac{\lambda^2}{\sigma^2}$$
where $\lambda^2 = \text{var}(x)$ is the sample variance as defined above, but $\sigma^2$ is defined as an estimate of the spectral density at frequency zero. (effectiveSize
uses the function spectrum0.ar
to do this, also from the coda
package.) More generally, $\sigma^2$ is an estimate of the variance in the Central Limit Theorem.
I wish I understood what "the spectral density at frequency zero" meant so I could elaborate, but hopefully this is a useful starting point for anyone who wishes to a) understand the calculation behind coda::effectiveSize
or b) wishes to program a user-defined function for computing sample size.
Now to answer my first question: "What is the correct definition of the ESS?"
As far as I can tell, there isn't one correct definition. What tipped me off was a paper which mentioned two R packages for calculating the ESS. After trying both on toy examples, I was still getting drastically different results. I'm not sure what definition of ESS is being used in the second package (mcmcse::ess
), but it demonstrates multiple definitions do exist. On that note, here is a concise list of definitions that I've found so far. (Disclaimer: I cannot vouch for their correctness.)
Note that they all take the general form:
$$
ESS_{\text{i}} = \frac{M}{\tau_i}
$$
where $M$ is the un-adjusted sample size (i.e. length of the vector $x$), and subscript $i$ denotes the specific definition. Hence I will focus on the definitions of $\tau_i$. In no particular order:
$$
\tau_1 = 1 + 2\sum_{l=1}^\infty \rho(l)
$$
where $\rho(l)$ is the sample autocorrelation at lag $l$.
Sources: https://www.johndcook.com/blog/2017/06/27/effective-sample-size-for-mcmc/, https://mc-stan.org/docs/2_20/reference-manual/effective-sample-size-section.html, https://people.orie.cornell.edu/davidr/or678/handouts/winBUGS.pdf, https://arxiv.org/pdf/1403.5536v1.pdf
$$
\tau_2 = \frac{1+\rho(1)}{1-\rho(1)}
$$
Source: https://imedea.uib-csic.es/master/cambioglobal/Modulo_V_cod101615/Theory/TSA_theory_part1.pdf
\begin{align*}
\tau_3 = & \frac{1}{M}\sum_{k,l=1}^M \text{cov}(x_k, x_l) \\
= &1 + 2 \left( \frac{M-1}{M}\rho_1 + \frac{M-2}{M}\rho_2 + \cdots + \frac{1}{M}\rho_{M-1}\right)
\end{align*}
(Note this is similar to $\tau_1$, but seems to use covariance rather than autocorrelation. Also, it seems $x_k$ and $x_l$ are scalars rather than vectors, which makes no sense to me.)
Source: Definition of autocorrelation time (for effective sample size)
$$
\tau_4 = \frac{\sigma^2}{\lambda^2}
$$
where $\lambda^2 = \text{var}(x)$ and $\sigma^2 = \lambda^2 + 2\sum_{l=1}\rho(l)$.
Sources: https://arxiv.org/pdf/1403.5536v1.pdf, Effective Sample Size greater than Actual Sample Size, https://cran.r-project.org/web/packages/mcmcse/mcmcse.pdf
$$
\tau_5 = \frac{\sigma^2}{\lambda^2}
$$
where $\lambda^2 = \text{var}(x)$ and $\sigma^2$ is an estimate of the spectral density at frequency zero (the coda::effectiveSize
definition above).
Note that the Stan Reference Manual provides an extension to $ESS_1$ for multiple MCMC chains.
Best Answer
First, the appropriate definition of "effective sample size" is IMO linked to a quite specific question. If $X_1, X_2, \ldots$ are identically distributed with mean $\mu$ and variance 1 the empirical mean $$\hat{\mu} = \frac{1}{n} \sum_{k=1}^n X_k$$ is an unbiased estimator of $\mu$. But what about its variance? For independent variables the variance is $n^{-1}$. For a weakly stationary time series, the variance of $\hat{\mu}$ is $$\frac{1}{n^2} \sum_{k, l=1}^n \text{cov}(X_k, X_l) = \frac{1}{n}\left(1 + 2\left(\frac{n-1}{n} \rho_1 + \frac{n-2}{n} \rho_2 + \ldots + \frac{1}{n} \rho_{n-1}\right) \right) \simeq \frac{\tau_a}{n}.$$ The approximation is valid for large enough $n$. If we define $n_{\text{eff}} = n/\tau_a$, the variance of the empirical mean for a weakly stationary time series is approximately $n_{\text{eff}}^{-1}$, which is the same variance as if we had $n_{\text{eff}}$ independent samples. Thus $n_{\text{eff}} = n/\tau_a$ is an appropriate definition if we ask for the variance of the empirical average. It might be inappropriate for other purposes.
With a negative correlation between observations it is certainly possible that the variance can become smaller than $n^{-1}$ ($n_{\text{eff}} > n$). This is a well known variance reduction technique in Monto Carlo integration: If we introduce negative correlation between the variables instead of correlation 0, we can reduce the variance without increasing the sample size.