Textbook reference for ergodicity condition for stationary sequences

ergodic-theoryprobability theoryreference-requeststochastic-processes

In a set of lecture notes, I have the following result:

Theorem. Let $X_n$ be random variables on $(\Omega, \mathcal{F}, \mathbb{P})$ with values in a Polish metric space $S$. Suppose $X = (X_n)_{n \geq 1}$ is a stationary sequence. Then $X$ is ergodic if and only if for any bounded Borel measurable function $g: S^p \to \mathbb{R}$ with $p \geq 1$ an arbitrary integer,
$$\dfrac{1}{n}\sum_{m=0}^{n-1}g(X_{m+1}, \dots, X_{m+p}) \overset{a.s.}{\to} \mathbb{E}[g(X_1, \dots, X_p)]\text{.}$$

Note that $\overset{a.s.}{\to}$ denotes almost sure convergence as $n \to \infty$.

I have been trying to find this result in the 20-30 measure-theoretic probability books I have to no avail, as well as An Introduction to Ergodic Theory by Walters. Does anyone know of a textbook where I can find this result? I would strongly prefer a reference with a proof, but would be willing to take those without as well.

Edit: Adding definitions as requested.

Given $X$ above, it is ergodic if for any invariant set $A \in \mathcal{F}$, $\mathbb{P}(A) \in \{0, 1\}$.

By "invariant set," we say a set $A \in \mathcal{F}$ is invariant with respect to $X$ if for some $B \in \mathcal{B}(\mathbb{R}^{\infty})$ ($\mathcal{B}(\mathbb{R}^{\infty})$ denoting the Borel $\sigma$-algebra generated by $\mathbb{R}^{\infty}$), $A = \{(X_n, X_{n+1}, X_{n+2}, \dots)\} \in B$ for all $n \geq 1$.

[I suspect that $S^{\infty}$ should be used in place of $\mathbb{R}^{\infty}$ in the above definitions and that $\in$ should be $\subset$, but that's how they are presented in the lecture notes.]

Edit 2: I found this claim in some other sources, though not in great detail. It would be nice to find a textbook.

Last sentence of http://www.columbia.edu/~ks20/6712-14/6712-14-Notes-Ergodic.pdf
Appendix A of GARCH Models: Structure, Statistical Inference and Financial Applications uses the theorem above as the definition of an ergodic stationary process. This passage cites Billingsley (1995), which I assume is Probability and Measure – but I know that this theorem is not in there.

Best Answer

This result is provided, without proof, as Theorem 5.6(e) of A First Course in Stochastic Processes, 2nd ed., by Karlin and Taylor (1975).

I don't have this book, but Stationary and Related Stochastic Processes: Sample Function Properties and Their Applications by Cramer and Leadbetter (2004) might have some information there. I will edit this post if I find out whether or not this result is mentioned in there.

Related Solutions

Examples of Invariant Events in Probability Theory

probably you already know this example, but in any case...

let $\Omega=\{0,1\}^{\mathbb{N}}$, $\mathcal{F}$ the $\sigma$-algebra generated by the cylinder sets of $\{0,1\}^{\mathbb{N}}$ and $ \mathbb{P}=\prod_{i\in\mathbb{N}} \mu_i $ is product measure with $\mu_i:\mathcal{P}(\{0,1\})\to [0,1]$ given by $\mu_i(\{0\})=\frac{1}{2}=\mu_i(\{1\})$ for all $i\in\mathbb{N}$.

We define $\varphi:\Omega\to\Omega$ in the following way:
We think about an element $\omega\in\Omega$ as an infinite sequence of zeros and one, that is, $\omega=(\omega_1,\omega_2,\ldots)$ and then put $$ \varphi(\omega_1,\omega_2,\ldots)=(\omega_2,\omega_3,\ldots) $$ this function is known as the left-shift.

You can prove that $\varphi$ is a measure-preserving map. The trick is prove that $\mathbb{P}(\varphi^{-1}\mathscr{C})=\mathbb{P}(\mathscr{C})$ for any cylinder set and then extend this to the whole $\sigma$-algebra using for example the extension measure theorems.

Aside comment: We can prove a more strong fact, this product measure is in fact ergodic for the shift, which implies that the invariant sets has measure zero or one.

To finish the example take $A=\Omega\setminus\{1,1,1,\ldots\}$ which is a measure one set. Note that $\varphi^{-1}A=\Omega$, once for any $\omega\in\Omega$ we have $\mathbb{P}(\{\omega\})=0$ (the proof of this statement is a consequence of the continuity of $\mathbb{P}$), follows that $\mathbb{P}(\varphi^{-1}A\Delta A)=0$.

Birkhoff ergodic theorem for ergodic Markov processes

Here we establish a general ergodic pathwise ergodic theorem and also consider the ergodic case in which the OP seems to be interested.

Concepts and definitions:

Suppose $(M,\mathscr{B},\mu)$ is a probability space ($M$ is a Polish space with a Borel $\sigma$-algebra for example). Let $\Omega=M^{\mathbb{Z}_+}$ equipped with the product $\sigma$-algebra $\mathscr{F}=\mathscr{B}^{\otimes\mathbb{Z}_+}$. For each $n\in\mathbb{Z}_+$ let $X_n:\Omega\rightarrow M$ be the projection $X_n(\omega)=\omega(n)$, define $\mathscr{F}_n:=\sigma(X_k:0\leq k\leq n)$. It is obvious that $(\mathscr{F}_n:n\in\mathbb{Z}_+)$ is a filtration and that the process $X:\omega\mapsto\omega$ us adapted to this filtration. There is (an applications if Ionescu-Tulcea's theorem for example) a unique probability measure $\mathbb{P}_\mu$ on $(\Omega,\mathscr{F})$ such that for any $A_0,\ldots, A_k\in\mathscr{B}$ and integers $0=n_0< n_1<\ldots <n_k$ $$\mathbb{P}_\mu[X_{n_j}\in A_j]=\int_{A_0}\int_{A_1}\ldots\int_{A_k}P^{n_k-n_{k-1}}(x_{k-1},dx_k)\ldots P^{n_1}(x_0,dx_1)\,\mu(dx_0)$$ where $P^0=I$ (identity) and $P^n=P P^{n-1}$ for $n\geq1$. Under this probability, $X$ is time homogeneous Markov chain with initial probability $\mu$ with transitions kernel $P$. In particular, when $\mu=\delta_{x}$ for some $x\in M$ we use the notation $\mathbb{P}_x=\mathbb{P}_{\delta_x}$. It is easy to check that for any probability measure $\mu$ on $(M,\mathscr{B})$ and any bounded measurable function $F:(\Omega,\mathscr{F})\rightarrow(\mathbb{R},\mathscr{B}(\mathbb{R}))$ \begin{align} \mathbb{E}_\mu[F]=\int_M\mathbb{E}_x[F]\,\mu(dx)\tag{0}\label{zero} \end{align}

Let $\theta:\Omega\rightarrow\Omega$ define the shift operator $(\theta(\omega))(n)=\omega(n+1)$. Then, for any bounded measurable function $F:(\Omega,\mathscr{B})\rightarrow(\mathbb{R},\mathscr{B}(\mathbb{R})$

$$\mathbb{E}_\mu[F(X\circ\theta^m)|\mathscr{F}_m]=\mathbb{E}_{X_m}[F(X)]$$

It is easy to check that $\mu P=\mu$ iff $\theta$ is $\mathbb{P}_\mu$ invariant ($\mathbb{P}_\mu[\theta^{-1}(B)]=\mathbb{P}_\mu[B]$ for all $B\in\mathscr{F}$) or equivalently, $\mu P=\mu$ iff $X$ is stationary w.r.t $\mathbb{P}_\mu$. The following result is what the OP seems to be looking for:

Theorem PW: Suppose $\mu P=\mu$, $\mu(M)=1$. For any $f\in L_1(\mu)$ there is $B_f\in\mathscr{B}$ such that $\mu(B_f)=1$, and a function $f^*\in L_1(\mu)$ such that for all $x\in B_f$ $$\frac1n\sum^{n-1}_{k=1}f(X_k)\xrightarrow{n\rightarrow\infty}f^*(X_0)\qquad \text{$\mathbb{P}_x$-a.s}$$ Moreover, $\int f^* \,d\mu=\int f\,d\mu$. If $\mu$ is $P$-ergodic, then for all $x\in B_f$, $f^*=\int f\,d\mu$ $\mathbb{P}_x$-a.s

Ergodic theorems:

Recall that a $P$ invariant measure $\mu$ is $P$ ergodic if for for any absorbent set $A$ ($P\mathbb{1}_A\geq\mathbb{1}_A$), $\mu(A)\in\{0,1\}$. In the particular case where $Pf=f\circ T$ for some $\mu$-invariant transformation $T$, $\mu$ is ergodic if $\mu(A)\in\{0,1\}$ for all $A\in\mathscr{B}$ with $T^{-1}(A)=A$.

It can be shown that if $\mu$ is $P$ ergodic, then for any $f\in L_1(\mu)$, $Pf=f$ $\mu$-a.s iff $f=\mu[f]:=\int f\,d\mu$ $\mu$-a.s.

We have state two ergodic theorems that act on different types of transformation. The first one is a direct consequence of the ergodic theorem of Dunford-Hopf-Schwartz and von Neumann for positive contractions:

Theorem DHS: Suppose $\mu P=\mu$, $\mu(M)=1$. For any $1\leq p<\infty$ and $f\in L_p(\mu)$ there is $Af\in L_p(\mu)$ such that $$A_nf=\frac{1}{n}\sum^{n-1}_{k=0}P^kf\xrightarrow{n\rightarrow\infty}Af$$ $\mu$-a.s. and in $L_p(\mu)$. Furthermore, $P(Af)=Af=A(Pf)$ $\mu$-a.s., and $\mu[Af]=\mu[f]$. If $\mu$ is ergodic, then $Af=\mu[f]$ $\mu$-a.s.

The $\mu$-a.s. convergence is known as individual ergodic theorem; convergence in $L_p$ is known as the mean ergodic theorem (von Neumann).
The limit $Af$ can be expressed in probabilistic terms. Let $\mathcal{I}^P_\mu$ the collection of sets in $\mathscr{B}$ such that $P\mathbb{1}_B=\mathbb{1}|_B$ $\mu$-a.s. It is possible to prove that $\mathcal{I}^p_\mu$ is a $\sigma$-algebra and that $Af=\mu[f|\mathcal{I}^P_\mu]$ $\mu$-a.s.(conditional expectation of $f$ given $\mathcal{I}^p_\mu$ under $\mu$).

On the other hand, we have the well known ergodic theorem of Birkoff and von Neumann

Theorem B: Suppose $(S,\mathscr{S},m)$ is a probability space and $T:(S,\mathscr{S})\rightarrow(S,\mathscr{S})$ is $\mu$ invariant ($m(T^{-1}(B))=m(B)$ for all $B\in\mathscr{S})$. For $1\leq p<\infty$ and $f\in L_p(m)$ there is $f^*\in L_p$ such that $$\frac{1}{n}\sum^{n-1}_{k=0}f\circ T^k\xrightarrow{n\rightarrow\infty} f^*$$ $m$-almsot surely and in $L_p(m)$. In particular, $$m[f^*]=m[f]$$ If $m$ is $T$ ergodic, $f^*=m[f]$ $m$-a.s.

As with Theorem DHS, $f^*$ can be expressed as a conditional expectation. Let $I_m=\{A\in\mathscr{S}: \mu(T^{-1}(A)\triangle A)=0$. this is a $\sigma$-alsgebra and $f^*=m(f|\mathcal{I}_\mu)$ $\mu$-a.s.
The Birkoff ergodic theorem can be obtained from the stronger DHS Theorem by cosidering the transition function $P_Tf:=f\circ T$ for all bonded measurable functions $f$ in $(S,\mathscr{S})$. This however might be an overkill. The technique of maximal inequalities can be used to prove both theorems (the individual parts); the $L_2$ result of von Neumann's theorem can be exploited to prove the mean ergodic results in borsht theorems.

We will only use explicitly Birkoff's ergodic theorem in the rest of this posting.

Proof of path wise ergodic theorem:

Applying Birkoff's ergodic theorem with $(S,\mathscr{S},m,T)=(\Omega,\mathscr{F},\mathbb{P}_\mu,\theta)$ we have that for any $G\in L_1(\mathbb{P}_\mu)$ there is $G^*\in L_1(\mathbb{P}_\mu)$ with $G^*\circ\theta=G^*$, $\mathbb{E}_\mu[G^*]=\mathbb{E}_\mu[G]$ and \begin{align}\frac1n\sum^{n-1}_{k=0}G\circ\theta^k\xrightarrow{n\rightarrow\infty}G^*\tag{1}\label{one}\end{align} $\mathbb{P}_\mu$-a.s. and in $L_1(\mathbb{P}_\mu)$.

Lemma: Suppose that $F\in\mathscr{F}_\infty$ is a bounded $\theta$--invariant function, that is $F=F\circ\theta$. If $\mathbb{P}_\mu$ is $\theta$ invariant, then $F=\mathbf{E}_\mu[F|\mathscr{F}_0]$ $\mathbb{P}_\mu$--a.s.

Proof of Lemma: Suppose $F\circ\theta=F$ and define $h(x)=\mathbb{E}_x[F]$. Then $$h(X_k)=\mathbb{E}_{X_k}[F]=\mathbb{E}_\mu[F\circ\theta^k|\mathscr{F}_k]=\mathbb{E}_\mu[F|\mathscr{F}_k]$$ It follows that $(h(X_k):k\in\mathbb{Z}_+)$ is a uniform integrable martingale with respect to the filtration $(\mathscr{F}_j:k\in\mathbb{Z}_+)$. An application of the martingale convergence theorem implies that $$h(X_k)\xrightarrow{k\rightarrow\infty} \mathbb{E}[F|\mathscr{F}_\infty]=F$$ $\mathbb{P}_\mu$-a.s. and in $L_1(\mathbb{P}_\mu)$. Using the stationarity of $\mathbb{P}_\mu$ again yields \begin{align} \|\mathbb{E}_{\mu}[F|\mathscr{F}_0]-F\|_{L_1(\mathbb{P}_\mu)}= \|(h(X_0) -F)\circ\theta^k\|_{L_1(\mathbb{P}_\mu)}= \|h(X_k)-F\|_{L_1(\mathbb{P}_\mu)}\xrightarrow{k\rightarrow\infty}0 \end{align} Therefore $\mathbb{E}[F|\mathcal{F}_0]=F$ $\mathbb{P}_\mu$-a.s. $ \blacksquare $

The Lemma above shows that $F^*=\mathbb{E}[F^*|\mathbf{F}_0]$ $\mathbb{P}_\mu$-a.s. and thus, there is a function $f^*\in L_1(\mu)$ such that $F^*(X)=f^*(X_0)$ $\mathbb{P}_\mu$-a.s.

Now, for $f\in L_1(\mu)$ and define consider $F_f(\omega):=f(\omega_0)$. Then, for some $f^*\in L_1(\mu)$, \begin{align}\frac1n\sum^{n-1}_{k=0}f(X_k)\xrightarrow{n\rightarrow\infty}f^*(X_0)\tag{2}\label{two}\end{align} $\mathbb{P}_\mu$-a.s. and in $L_1(\mathbb{P}_\mu)$ . Hence $$\int\mathbb{P}_x\Big[\{\frac1n\sum^{n-1}_{k=0}f(X_k)\xrightarrow{n\rightarrow\infty}f^*(X_0)\big\}\Big]\,\mu(dx)=1$$ It follows that there is $B_f\in\mathscr{B}$ such that $\mu(B_f)=1$, and for $x\in B_f$ \begin{align} \frac1n\sum^{n-1}_{k=0}f(X_k)\xrightarrow{n\rightarrow\infty}f^*(X_0) \qquad\text{$\mathbb{P}_x$-a.s.} \tag{3}\label{three} \end{align}

Ergodic case

When $\mu$ is $P$ ergodic, the limit function $f^*$ in the pathwise ergodic theorem described above is constant ($\mathbb{P}_x$-a.s. for all $x\in B_f$). This seems to be the result that the OP is mostly interested. This follows directly from the following result:

Theorem E: $\mu$ is $P$ ergodic iff $\theta$ is $\mathbb{P}_\mu$ ergodic.

Proof of Theorem E:

Necessity: Suppose $\mu$ is $P$ ergodic. For any bounded $\mathscr{F}$--measurable function $F$ define $h: x\mapsto \mathbb{E}_x[F]$. The Markov property implies that \begin{align} \mathbb{E}_x[F\circ\theta^k] =\mathbb{E}_x[\mathbb{E}_{X_k}[F]] =\mathbb{E}_x[h(X_k)]=P^kh(x) \end{align} If $F=F\circ\theta$ then $Ph=h$ and so, $h=\int f\,d\mu=\mathbb{E}_\mu[F]$ $\mu$-a.s. We will show that $F=\mathbb{E}_\mu[F]$ $\mathbb{P}_\mu$--a.s. Indeed, let $A\in\mathcal{F}_n$. Using the Markov property once more yields \begin{align} \mathbb{E}_\mu[F;A]&=\mathbb{E}_\mu[F\circ\theta^n;A]= \mathbb{E}_\mu[\mathbb{1}_A\mathbb{E}_\mu[F\circ\theta^n|\mathscr{F}_n]]\\ &= \mathbb{E}_\mu[\mathbb{E}_{X_n}[F];A]=\mathbb{E}_\mu[h(X_n);A]= \mathbb{E}_\mu[F]\mathbb{P}_\mu[A]; \end{align} where the last equality follows from $h=\mathbb{E}_\mu[F]$,,$\mu$--a.s. and the fact that the distribution of $X_n$ is $\mu$. Since $\mathscr{F}=\sigma(\cup_n\mathscr{F}_n)$ and $\cup_n\mathscr{F}_n$ is an algebra, it follows by monotone class arguments that $F=\mathbb{E}_\mu[F]$\ $\mathbb{P}_\mu$--a.s.

Sufficiency: Suppose that $\mathbb{P}_\mu$ is $\theta$--ergodic, and let $F$ be a bounded $\mathcal{F}$--measurable function. By Birkoff's ergodic theorem \begin{align} \frac1n\sum^{n-1}_{k=0}F\circ\theta^k \longrightarrow \mathbb{E}_\mu[F]\quad \text{$\mathbb{P}_\mu$--a.s. and in $L_1(\mathbb{P}_\mu)$} \end{align}

Consequently, there is $B\in\mathscr{B}$ with $\mu(B)=1$ such that for all $x\in B$, $\frac1n\sum^{n-1}_{k=0}F\circ\theta^k \longrightarrow \mathbb{E}_\mu[F]$ , $\mathbb{P}_x$--a.s. Thus, by dominated convergence,
\begin{align} \frac1n\sum^{n-1}_{k=0}\mathbb{E}_x[F\circ\theta^k] \longrightarrow \mathbb{E}_\mu[F], \qquad \text{$\mu$--a.s.}\tag{4}\label{four} \end{align} Let $A$ be a $P$--absorbent set and $F(\omega)=\mathbb{1}_A(\omega_0)$. Then, for any $n\in\mathbb{Z}_+$, $\mathbb{E}_x[F\circ\theta^n]=\mathbb{E}_x[\mathbb{1}_A(X_n)]=P^n\mathbb{1}_A(x)\geq\mathbb{1}_A(x)$; hence, $P^n\mathbb{1}_A=\mathbb{1}_A$ , $\mu$--a.s. and from \eqref{four}, $\mu[A]=\mathbb{1}_A$, $\mu$--a.s. Therefore, $\mu[A]\in\{0,1\}$. $ \blacksquare $

Best Answer

Related Solutions

Examples of Invariant Events in Probability Theory

Birkhoff ergodic theorem for ergodic Markov processes

Related Question