Time Series – Stochastic Modelling, Distribution and Ergodicity of Time Series with Finite History

ergodicrandomnessstochastic-processestime series

Let $\Omega$ be a sample space. A stochastic process $\{Y_t\}$ is a function of both time $t \in \{1, 2, 3, \ldots\}$ and outcome $\omega \in \Omega$.

  • For any time $t$, $Y_t$ is a random variable (i.e. a function from $\Omega$ to the space of real numbers $\mathcal{R}$).
  • For any outcome $\omega$ the series $y(\omega)$ is a time-series of real numbers: $\{y_1(\omega), y_2(\omega), y_3(\omega), \ldots \}$

We can model a particular time series with an observed history $(y_1,\dots,y_t)$ as a realization of a stochastic process.

It is quite common in practice* that

  • we are only interested in a particular observed series (corresponding to a particular $\omega_0\in\Omega$) and its future or historical development and
  • we do not care about the hypothetical other realizations of the same stochastic process (corresponding to $\omega\in\Omega\backslash\omega_0$).

I.e. we want to make inference and predictions about $\{y_t({\omega_0)}\}$ rather than $\{Y_t\}$ and/or $y_{t+1}({\omega_0)}$ rather than $Y_{t+1}$. But then it seems to me we cannot talk about the distribution of $\{y_t(\omega_0)\}$ (a realization of a random process) or $y_{t+1}(\omega_0)$ (a realization of a random variable) or any derived features such as moments. Yet time series analyses are full of expected values (conditional or unconditional), variances (cond. or uncond.) and the like. And then there are discussions of ergodicity which again seem irrelevant if we only have a single, fixed $\omega_0$. I am confused by this. How do I think about it?

*An example: the daily closing share price of Tesla. We can obtain the historical time series $(y_1,\dots,y_t)$ from a financial database. We may want to discover patterns in the series and extrapolate them into the future, perhaps with an aim of investing in Tesla's share and making money or (more modestly) assessing and managing risk. We do not care about any other realization from the data generating process (DGP) has generated Tesla's daily share price, as we do not believe there exists another company the share price of which is governed by the exact same laws.
A counterexample: second-by-second altitude of a helium atom in a closed container filled with pure helium. Roughly, every atom is the same, and the laws that govern atom A govern also atom B and all the other atoms in the container. If we built a model based on the historical time series of atom A, we could still be interested in generalizing across atoms and saying that the model applies to each of them. Ergodicity may then naturally be relevant.


A follow-up question: "Stationarity and ergodicity of a process conditional on a finite trajectory".

Best Answer

Your question misunderstands how conditioning works in measure-theoretic probability

I think you are letting the complexity of the notation for a stochastic process get in the way of intuitive understanding of the essence of your question (let alone its answer). Remember that each outcome $\omega$ in the abstract probability space is essentially indexing a full stipulation of the process which gives the outcome of all the values in the infinite series. So you are wrong to state that an observed vector $(y_1,...,y_n)$ for a finite number of time values corresponds to a particular outcome. Because the observed vector is just a part of the infinite series, it must correspond to an infinite set of possible outcomes in the abstract sample space --- specifically, observation of that vector is the event:

$$\mathcal{E}_* \equiv \mathcal{E}_*(\mathbf{y}_n) \equiv \{ \omega \in \Omega | Y_1(\omega)=y_1, ..., Y_n(\omega)=y_n \}.$$

It appears that all you are trying to do here is to understand what happens when you condition on observing this vector of values (i.e., when you condition on this event). You say that it is quite common practice to condition on a single outcome $\omega_0 \in \Omega$, but I see no evidence that this is practiced at all, let alone being common. Conditioning on a single outcome would of course reduce the whole problem to a deterministic series (in principle a known series) and so it would remove probability from the matter entirely. That is not how the conditioning is done.

Okay, so what is actually done here? Rather than reducing things down to a single outcome, the observed vector of values reduces things down to the conditioning event $\mathcal{E}_*$ (which still contains an infinite number of outcomes), and we proceed using conditional probability based on this observed event. This is no different to any other application of conditional probability --- by conditioning you reduce the sample space and you then get a new probability measure that gives the conditional probability of any event of interest. Formally, if you start your analysis in the probability space $(\Omega, \mathscr{G}, \mathbb{P})$ the new conditional probability space is $(\mathcal{E}_*, \mathscr{G}_*, \mathbb{P}_*)$ where the conditional sigma-field and probability measure satisfy:

$$\mathscr{G}_* = \{ \mathcal{E} \in \mathscr{G} | \mathcal{E} \subseteq \mathcal{E}_* \} \quad \quad \quad \quad \quad \int_\mathcal{A} \mathbb{P}_* d \mathbb{P} = \mathbb{P}(\mathcal{A} \cap \mathcal{E}_*) \quad \text{for all } \mathcal{A} \in \mathcal{E}_*.$$

In regard to the basic mechanics of conditioning, it really doesn't matter whether you are dealing with an application that is "repeatable" in the frequentist sense or not. If you are already at a point where you are applying probabilistic analysis to it then you have already made the decision that this framework is philosophically appropriate (and you are right to do so, since probability can be given an epistemic interpretation in any case). So if you are looking at the price of Tesla shares over time (and noting that there is only one Tesla) then this is not mathematically any different to if you are looking at some hypothesised process where the experiment generating the data is "repeatable" in the classical frequentist sense. (Of course, you are making conditional inferences about a single series, not an inference about some parallel "repeated" experiment, which would be different.)

Once you clear out the confusion relating to how conditioning works in measure-theoretic probability, you are in a good position to get to the essence of your questions about other properties such as ergodicity. It appears that you want to know if ergodicity (and maybe other properties) still hold for the remainder of the series (the unobserved part) once we condition on observing a vector of values. I won't get into the answer to that here, since I think the focus of your question is just to clear up the confusion around how conditioning works in measure-theoretic probability.

Related Question