This is meant to be a general question, aiming to clarify the topic for a beginner in TSA, as I haven't found any clear introductory explaination yet.
Suppose I am working with some data which includes some time variable.
Then I want to know if this variable is significative and if it is worth to structure my analysis as a TSA or not.
I'm not specifically looking for any seasonality in the data, just want to know if time is an important feature in my analysis.
-
Should I test for stationarity or independence?
-
Independence
Two events are statistically independent if the occurrence of one does not affect the probability of occurrence of the other.
Let $A$ and $B$ be two events then they are independent if and only if:
$P(A\cap B) = P(A)P(B)$
-
Stationarity
A stationary process is a stochastic process whose unconditional joint probability distribution does not change when shifted in time.
Let $X_t$ be a stochastic process and $F_X$ represent the cumulative distribution function of the unconditional joint distribution of $X_t$, then $X_t$ is stationary if for all $k$, $\tau$ and $t_1,…, t_k$
$F_X(x_{t_1+\tau},…,x_{t_k+\tau}) = F_X(x_{t_1},…,x_{t_k})$
-
I see that the definitions of these properties are very similar, but I'm not sure about their relation in this context.
-
What is the difference between these two concepts in this context?
-
[Bonus question] What tests should I use in this situation?
Best Answer
As a preliminary matter, it is worth noting that "independence" is a very vague condition unless it comes with a clear specification of what is independent from what. Conceptually, independence is a much broader concept, whereas stationarity of a time-series is a particular condition on the series that can be framed as a particular kind of independence. To answer your question I will show you how you can frame the condition of strict stationarity as an independence condition, and then I will discuss the intuition behind this condition.
Stationarity of a time-series can be framed as a type of independence: You have stated the definition of (strong) stationarity in your question, but I will reiterate it in my own notation:
For any integer random variable $S$ we can define the shifted process $\boldsymbol{X}^S = \{ X_{t-S} | t \in \mathbb{Z} \}$, which shifts the stochastic process $\boldsymbol{X}$ forwards by $S$ time units. Using this randomly shifted time-series, we can now frame the requirement of strong stationarity in terms of an independence condition.
From this theorem (which I just made up, but I imagine there is probably something like it in books somewhere) we can see that the condition of strict stationarity can be framed as an independence condition. If you have a stochastic process $\boldsymbol{X}$ and a random time-shift $S$ that is independent of the process, then stationarity occurs when the shifted process is independent of the time-shift variable itself. In other words, the joint distribution of the values in the stochastic process are not affected by knowledge of how much the process was shifted.
Your specific questions:
Independence of what? Stationarity is a type of independence, but if you have some other type of independence in mind, you will need to specify what that is. Whether you should test for particular types of independence depends on your interests in the problem, but it is generally useful to test for stationarity, since stationary time-series models have a number of well-known model forms that might be useful to you.
As you can see from the above, stationarity is a type of independence. The general concept of independence is broader than this, and encompasses any possible assertion of independence of two or more random variables. In the context of time-series analysis, it is common to deal with models that are stationary, or models that have time-based trends (either trend or drift terms) with an underlying stationary error series. In this case it is common to test whether trend or drift terms are present in the model.
All you have told us about your data is that it has a time variable, and you want to know if this variable is important. This isn't much to go on, but it sounds like you might want to formulate a time-series model for your data, and test to see if there is any trend or drift term that would lead to a general systematic change in values over time. This is effectively a test of stationarity against an alternative with a trend or drift in the model. To give more information we would need to know more about your data. (Perhaps in a new question?)