[Math] the difference between ergodicity and the law of large numbers

ergodic-theorylaw-of-large-numbersmarkov chainsprobabilitystochastic-processes

I want to begin by saying that I know absolutely no measure theory.

To my knowledge, roughly speaking a stochastic process is ergodic if its time average converges to the expectation (space average) over a long period of time.

For an iid sequence $X_{1} \ldots X_{n}$ with finite mean, I noticed that if consider the index as time then the average of this sequence $\frac{1}{n} \sum_{i=1}^{n} X_{i}$ is the very definition of time average. The law of large numbers states this will converge to $E[X_{i}]$ as $n \rightarrow \infty$.

So, I am not sure how the law of large numbers is different from ergodicity? Looks to me they are saying the same thing. Can a stochastic process be ergodic if it isn't iid?

I am also not sure how the definition of ergodicity coincides with the definition given in the context of markov chains, where the chain is ergodic if it is aperiodic, irreducible, and finite mean recurrence time.

Best Answer

A stochastic process is a collection $\{X_t\}_t$ of real-valued (for this discussion) random variables defined on a common probability space $(\Omega,\mathcal{F},P)$. For this discussion, we'll have the indexing be over the natural numbers $\mathbb{N}$: $(X_n)_{n \ge 1}$. For the definitions we consider, we insist that each $X_n$ has the same mean, denoted $E[X]$. A process $(X_n)_n$ is stationary if, for any $m \ge 1$ and any Borel sets $A_1,\dots,A_m$, it holds that $P(X_1 \in A_1, \dots, X_m \in A_m) = P(X_2 \in A_1,\dots,X_{m+1} \in A_m)$; in other words, $(X_n)_n$ is stationary if the law of $(X_1,X_2,X_3,\dots)$ is the same as the law of $(X_2,X_3,\dots)$ (where the value space is $\mathbb{R}^\mathbb{N}$ with product sigma-algebra).

We say that $(X_n)_n$ is ergodic if $(X_n)_n$ is stationary and $(\mathbb{R}^\mathbb{N},\mathcal{B}^\mathbb{N},P^\mathbb{N},T)$ is ergodic as a measure preserving system, where $P^\mathbb{N}(A) := P((X_1(\omega),X_2(\omega),\dots) \in A)$ and $T$ is the left shift: $T((x_1,x_2,x_3,\dots)) := (x_2,x_3,\dots)$ (note that $(X_n)_n$ stationary implies $T$ preserves $P^\mathbb{N}$). A consequence of this notion of ergodicity is that for each measurable $f: \mathbb{R} \to \mathbb{R}$, it holds that $\frac{1}{N}\sum_{n \le N} f(X_n(\omega)) \to E[f(X)]$ with probability $1$.

We say that $(X_n)_n$ is mean-ergodic if $E\left[|\frac{1}{N}\sum_{n \le N} X_n - E[X]|^2\right] \to 0$ as $N \to \infty$. Note that $(X_n)_n$ need not be stationary.

We say that $(X_n)_n$ is pointwise-ergodic if $\frac{1}{N}\sum_{n \le N} X_n(\omega) \to E[X]$ with probability $1$. Note that ergodic implies pointwise-ergodic.

We say that $(X_n)_n$ is LLN-applicable if the $X_n's$ are i.i.d. and the set of $\omega \in \Omega$ satisfying $\frac{1}{N}\sum_{n \le N} X_n(\omega) \to E[X]$ has measure/probability $1$.

It is not clear what is usually meant by "ergodic". Wikipedia uses "ergodic" to mean mean-ergodic.

As the comments indicate, there are $(X_n)_n$ that are pointwise-ergodic but not i.i.d. and thus not LLN-applicable. For example, we can take any $Y_n$'s that are i.i.d. and then consider $(X_1,X_2,\dots) = (Y_1,Y_1,Y_2,Y_3,\dots)$.

This answer gives an example of $(X_n)_n$ that are stationary, pointwise-ergodic, but not ergodic. The example is also mean-ergodic if $E[Y_i] = 0$ and $E[Y_i^2] < \infty$.

Let $Y_1,Y_2,\dots$ be independent with, for each $i$, $P(Y_i = 1) = \frac{1}{2}$ and $P(Y_i = -1) = \frac{1}{2}$. Let $X_n = \frac{Y_1+\dots+Y_n}{\sqrt{n\log\log\log n}}$. Then indeed $E[X_n]$ is the same for each $n$ (the common mean is $0$) and $(X_n)_n$ is mean-ergodic, since $E[|X_n-0|^2] = E[X_n^2] = \frac{n}{n\log\log\log n} \to 0$. However, $(X_n)_n$ is not pointwise-ergodic, since with probability $1$, $\limsup_{n \to \infty} \frac{Y_1+\dots+Y_n}{\sqrt{2n\log\log n}} = 1$. (It's obviously not ergodic, since it's not stationary).

If the $X_n$'s are bounded, then the dominated convergence theorem implies that pointwise-ergodic implies mean-ergodic.


A Markov chain, for this discussion, is a stochastic process $(X_n)_{n \ge 1}$ taking values in a common, finite set $X$ such that (1) $P(X_n=x) > 0$ for each $n \ge 1$ and $x \in X$ and (2) for each $n,x_1,\dots,x_n$, it holds that $P(X_n = x_n | X_{n-1} = x_{n-1},\dots,X_1=x_1) = P(X_n = x_n | X_{n-1} = x_{n-1})$. We will make the common assumption/imposition that the Markov chain is time homogeneous, meaning $P(X_n = x_n | X_{n-1} = x_{n-1})$ is independent of $n$ and also that $P(X_1 = x) = P(X_2 = x)$ for each $x \in X$. An easy exercise shows that the time homogeneity assumption implies that $(X_1,X_2,\dots)$ is stationary (hint: use induction).

Associated to a Markov chain, we can define a weighted, directed graph on $X$ with the weight of the directed edge from $x$ to $y$ being $P(X_2 = y | X_1 = x)$ if the probability is positive (if it's not, don't draw a directed edge).

A Markov chain is strongly connected if one can get from one vertex to any other vertex, travelling along the graph.

.

$\bf{Claim}$: A Markov chain is ergodic if and only if it is strongly connected.

Proof: Suppose the Markov chain is ergodic. Take any $x_0,x_1 \in X$. Let $f = 1_{x_1}$. Then, with probability $1$, $\frac{1}{N}\sum_{n \le N} f(X_n(\omega)) \to E[f(X)] = P(X = x_1)$. Since $P(X=x_1) > 0$, $P(\exists n \ge 2 : X_n = x_1 | X_1 = x_0) > 0$, which, by countable additivity, means there is some $n \ge 2$ with $P(X_n = x_1 | X_1 = x_0) > 0$, but $P(X_n = x_1 | X_1 = x_0) = \sum_{y_2,\dots,y_{n-1}} P(X_n = x_1 | X_{n-1} = y_{n-1})\dots P(X_2 = y_2 | X_1 = x_0)$, so there is some $y_2,\dots,y_{n-1}$ with each of the terms in the product being nonzero, which means there is a path from $x_0$ to $x_1$. Therefore, the Markov chain is strongly connected. The other direction uses Perron-Frobenius and some ergodic theory, which I omit.

Related Question