Solved – Difference between independence and stationarity tests in time series

hypothesis testingindependencestationaritytime series

This is meant to be a general question, aiming to clarify the topic for a beginner in TSA, as I haven't found any clear introductory explaination yet.

Suppose I am working with some data which includes some time variable.
Then I want to know if this variable is significative and if it is worth to structure my analysis as a TSA or not.
I'm not specifically looking for any seasonality in the data, just want to know if time is an important feature in my analysis.

  1. Should I test for stationarity or independence?

    • Independence

      Two events are statistically independent if the occurrence of one does not affect the probability of occurrence of the other.

      Let $A$ and $B$ be two events then they are independent if and only if:

      $P(A\cap B) = P(A)P(B)$

    • Stationarity

      A stationary process is a stochastic process whose unconditional joint probability distribution does not change when shifted in time.

      Let $X_t$ be a stochastic process and $F_X$ represent the cumulative distribution function of the unconditional joint distribution of $X_t$, then $X_t$ is stationary if for all $k$, $\tau$ and $t_1,…, t_k$

      $F_X(x_{t_1+\tau},…,x_{t_k+\tau}) = F_X(x_{t_1},…,x_{t_k})$

I see that the definitions of these properties are very similar, but I'm not sure about their relation in this context.

  1. What is the difference between these two concepts in this context?

  2. [Bonus question] What tests should I use in this situation?

Best Answer

As a preliminary matter, it is worth noting that "independence" is a very vague condition unless it comes with a clear specification of what is independent from what. Conceptually, independence is a much broader concept, whereas stationarity of a time-series is a particular condition on the series that can be framed as a particular kind of independence. To answer your question I will show you how you can frame the condition of strict stationarity as an independence condition, and then I will discuss the intuition behind this condition.


Stationarity of a time-series can be framed as a type of independence: You have stated the definition of (strong) stationarity in your question, but I will reiterate it in my own notation:

Let $\boldsymbol{X} = \{ X_t | t \in \mathbb{Z} \}$ be a stochastic process. This process is said to be strongly stationary if for all time indices $t_1,...,t_k \in \mathbb{Z} \text{ }$ and all series values $x_{t_1}, ..., x_{t_k}$ we have: $$\mathbb{P}(X_{t_1} \leqslant x_{t_1}, ..., X_{t_k} \leqslant x_{t_k}) = \mathbb{P}(X_{t_1+s} \leqslant x_{t_1}, ..., X_{t_k+s} \leqslant x_{t_k}) \quad \quad \text{for all }s \in \mathbb{Z}.$$

For any integer random variable $S$ we can define the shifted process $\boldsymbol{X}^S = \{ X_{t-S} | t \in \mathbb{Z} \}$, which shifts the stochastic process $\boldsymbol{X}$ forwards by $S$ time units. Using this randomly shifted time-series, we can now frame the requirement of strong stationarity in terms of an independence condition.

Theorem: The process $\boldsymbol{X}$ is strongly stationary if and only if, for all integer random variables variables $S \text{ } \bot \text{ } \boldsymbol{X}$ we have $S \text{ } \bot \text{ } \boldsymbol{X}^S$.


Proof: To show equivalence of the conditions we will first show that strong stationarity implies the independence condition ($\implies$) and then we will show that strong stationarity is implied by the independence condition ($\impliedby$).

($\implies$) Assume that strong stationarity holds and let $S \text{ } \bot \text{ } \boldsymbol{X}$ be an arbitrary integer random variable that is independent of the original process. Then for all time indices $t_1,...,t_k \in \mathbb{Z} \text{ }$ and all series values $x_{t_1},...,x_{t_k}$ we have:

$$\begin{equation} \begin{aligned} \mathbb{P}(X^S_{t_1} \leqslant x_{t_1}, ..., X^S_{t_k} \leqslant x_{t_k} | S=s) &= \mathbb{P}(X_{t_1-s} \leqslant x_{t_1}, ..., X_{t_k-s} \leqslant x_{t_k} | S=s) \\[6pt] &= \mathbb{P}(X_{t_1-s} \leqslant x_{t_1}, ..., X_{t_k-s} \leqslant x_{t_k}) \\[6pt] &= \mathbb{P}(X_{t_1} \leqslant x_{t_1}, ..., X_{t_k} \leqslant x_{t_k}). \\[6pt] \end{aligned} \end{equation}$$

Since the right-hand-side of this equation does not depend on $s$, we have $S \text{ } \bot \text{ } \boldsymbol{X}^S$.

($\impliedby$) Let $S \text{ } \bot \text{ } \boldsymbol{X}$ be an integer random variable that is independent of the original process and has support on the whole set of integers (i.e., $\mathbb{P}(S=s)>0$ for all $s \in \mathbb{Z}$). Assume that the independence condition $S \text{ } \bot \text{ } \boldsymbol{X}^S$ holds. Then for all time indices $t_1,...,t_k \in \mathbb{Z} \text{ }$ and all series values $x_{t_1},...,x_{t_k}$ we have:

$$\begin{equation} \begin{aligned} \mathbb{P}(X_{t_1} \leqslant x_{t_1}, ..., X_{t_k} \leqslant x_{t_k}) &= \mathbb{P}(X^S_{t_1+S} \leqslant x_{t_1}, ..., X^S_{t_k+S} \leqslant x_{t_k}) \\[6pt] &= \mathbb{P}(X^S_{t_1+S} \leqslant x_{t_1}, ..., X^S_{t_k+S} \leqslant x_{t_k} | S=s) \\[6pt] &= \mathbb{P}(X^S_{t_1+s} \leqslant x_{t_1}, ..., X^S_{t_k+s} \leqslant x_{t_k} | S=s) \\[6pt] &= \mathbb{P}(X^S_{t_1+s} \leqslant x_{t_1}, ..., X^S_{t_k+s} \leqslant x_{t_k} | S=0) \\[6pt] &= \mathbb{P}(X_{t_1+s} \leqslant x_{t_1}, ..., X_{t_k+s} \leqslant x_{t_k}). \\[6pt] \end{aligned} \end{equation}$$

(The step from the first line to the second is allowed by our independence assumption.) Since this equation holds for all $s \in \mathbb{Z}$ we have established the strong stationarity of the original process. $\blacksquare$

From this theorem (which I just made up, but I imagine there is probably something like it in books somewhere) we can see that the condition of strict stationarity can be framed as an independence condition. If you have a stochastic process $\boldsymbol{X}$ and a random time-shift $S$ that is independent of the process, then stationarity occurs when the shifted process is independent of the time-shift variable itself. In other words, the joint distribution of the values in the stochastic process are not affected by knowledge of how much the process was shifted.


Your specific questions:

Should I test for stationarity or independence?

Independence of what? Stationarity is a type of independence, but if you have some other type of independence in mind, you will need to specify what that is. Whether you should test for particular types of independence depends on your interests in the problem, but it is generally useful to test for stationarity, since stationary time-series models have a number of well-known model forms that might be useful to you.

What is the difference between these two concepts in this context?

As you can see from the above, stationarity is a type of independence. The general concept of independence is broader than this, and encompasses any possible assertion of independence of two or more random variables. In the context of time-series analysis, it is common to deal with models that are stationary, or models that have time-based trends (either trend or drift terms) with an underlying stationary error series. In this case it is common to test whether trend or drift terms are present in the model.

What tests should I use in this situation?

All you have told us about your data is that it has a time variable, and you want to know if this variable is important. This isn't much to go on, but it sounds like you might want to formulate a time-series model for your data, and test to see if there is any trend or drift term that would lead to a general systematic change in values over time. This is effectively a test of stationarity against an alternative with a trend or drift in the model. To give more information we would need to know more about your data. (Perhaps in a new question?)