Birkhoff ergodic theorem for ergodic Markov processes

dynamical systemsergodic-theorymarkov-processmeasure-theoryprobability theory

This question might be easy but I am really stuck on it.

Let $M$ be compact metric space and $\mathcal B(M)$ the Borel $\sigma$-algebra of M. Consider the discrete-time Markov process,
$$\mathbf{X} =\left(\Omega,\{\mathcal F_n\}_{n\in\mathbb N}, \{X_n\}_{n\in\mathbb N}, \{P_n\}_{n\in\mathbb N} , \{\mathbb P_x\}_{x\in M}\right), $$
with state space $(M,\mathcal B(M)$ (I am considering that $0\in\mathbb N$) i.e.

  1. $(\Omega,\mathcal F_n),$ is a filtered measurable space,
  2. $X_n:\Omega\to M$ is $\mathcal F_n$ measurable,
  3. $\mathbb P_x [X_0 = x] =1,$ for every $x\in M,$
  4. For every $0 \leq n \leq m \in \mathbb N,$ $f:M\to\mathbb R$ bounded measurable function, and $x\in M$
    $$\mathbb E_x [f(X_{n+m}) \mid \mathcal F_n] = (P_m f)(X_n), \ \mathbb P_x\ \mathrm{a.s.}, $$
    where $P_n$ is a transition function on $(M,\mathcal B(M)),$ i.e. a family of probability maps $P_n : M\times \mathcal B(M) \to [0,1],$ such that
  • $P_0(x,\mathrm{d} y) = \delta_x(\mathrm{d}y),$
  • $P_n(x,\cdot)$ is a Borel probability measure for every $x\in M.$
  • For every $n,m \in \mathbb N$ and $A\in\mathcal B(M),$
    $$P_{n+m}(x, A) = \int_{M} P_n(y,A) P_m(x,\mathrm{d} y). $$

Assume that $\mathbf{X}$ admits an ergodic stationary measure $\mu$ on $M,$ i.e.
$$\int_{M} P_n(x,A) \mu(\mathrm d x) = \mu(A),\ \forall \ A\in\mathcal B(M), $$
and if
$$P_1(x,A) = 1,\ \forall \ x \ \mu\text{-a.s.}\ \in A \Rightarrow \mu(A) = 0\ \text{or }1. $$

Question: I would like to know if under this setup we would have the following
ergodic theorem. For every $f\in L^1(M,\mathcal B(M), \mu),$ we obtain
$$ \lim_{n\to\infty} \frac{1}{n} \sum_{i=0}^{n-1}f (X_n(\omega)) = \int_{M} f(x)\mu(\mathrm{d} x),\ \forall\ \omega \text{-}\mathbb P\ \text{a.s.,} $$
where $\mathbb P(\mathrm{d} y) := \int_{M}\mathbb P_x (\mathrm{d} y) \mu(\mathrm{d} x).$


Comments regarding my question

Consider $\mathbf{X}$ being an ergodic Markov process (using the above notation). For every $n\in\mathbb N$ let us consider the projection map
\begin{align*}
\pi_n : M^{\mathbb N}&\to M\\
(x_m)_{m\in\mathbb N}&\mapsto x_n.
\end{align*}

If we define (via Komolgorov extension Theorem) the Borel probability measure $P_\mu$ on $M^\mathbb N$ as the unique Borel probability, such that given $A_0,\ldots,A_n \in M,$ then
$$P_{\mu}\left(\{x_n\}_{n\in\mathbb N} \in M^{\mathbb N}; x_i\in A_i, \ \forall \ i\in\{0,1,\ldots,n\}\right) = \int_{A_0}\int_{A_1} \ldots \int_{A_{n-1}} P_1(x_{n},A_n) P_1(x_{n-1},\mathrm{d}x_n) \ldots P_1(x_0,\mathrm{d} x_1) \mu(\mathrm{d}x_0). $$

We have that the shift
\begin{align*}
\theta: (M^{\mathbb N},\mathcal B(M^{\mathbb N}) , P_\pi)&\to (M^{\mathbb N},\mathcal B(M^{\mathbb N}),P_\pi) \\
(x_{n})_{n\in\mathbb N}&\to (x_{n+1})_{n\in\mathbb N},
\end{align*}

is an ergodic dynamical system and we have that for every $f\in L^1(M,\mathcal B(M),\mu)$
$$\lim_{n\to\infty}\frac{1}{n}\sum_{i=0}^{n-1} f(\pi_i(\omega)) = \int_M f(x) \mu(\mathrm{d} x),\ \forall \ \omega\text{-}P_\pi\ a.s.. $$

How do I translate the information of the canonical process (the one above) to the original Markov process $\mathbf{X}$? For every $\omega \in \Omega$, we have that
$$\left( X_n(\omega))_{n\in\mathbb N}\right) \in M^{\mathbb N}, $$
and
$$\pi_i\left(\left(X_n(\omega)\right)_{n\in\mathbb N}\right) = X_i(\omega).$$

But it is not clear that
$$\lim_{n\to\infty}\frac{1}{n}\sum_{i=0}^{n-1} f( X_i(\omega) ) = \int_M f(x) \mu(\mathrm{d} x),\ \forall \ \omega\text{-}\mathbb P\ a.s., $$
where $\mathbb P (\mathrm{d} y) = \int_M \mathbb P_x(\mathrm{d} y) \mu(\mathrm{d}x), $ can anyone help me?

Best Answer

Here we establish a general ergodic pathwise ergodic theorem and also consider the ergodic case in which the OP seems to be interested.


Concepts and definitions:

Suppose $(M,\mathscr{B},\mu)$ is a probability space ($M$ is a Polish space with a Borel $\sigma$-algebra for example). Let $\Omega=M^{\mathbb{Z}_+}$ equipped with the product $\sigma$-algebra $\mathscr{F}=\mathscr{B}^{\otimes\mathbb{Z}_+}$. For each $n\in\mathbb{Z}_+$ let $X_n:\Omega\rightarrow M$ be the projection $X_n(\omega)=\omega(n)$, define $\mathscr{F}_n:=\sigma(X_k:0\leq k\leq n)$. It is obvious that $(\mathscr{F}_n:n\in\mathbb{Z}_+)$ is a filtration and that the process $X:\omega\mapsto\omega$ us adapted to this filtration. There is (an applications if Ionescu-Tulcea's theorem for example) a unique probability measure $\mathbb{P}_\mu$ on $(\Omega,\mathscr{F})$ such that for any $A_0,\ldots, A_k\in\mathscr{B}$ and integers $0=n_0< n_1<\ldots <n_k$ $$\mathbb{P}_\mu[X_{n_j}\in A_j]=\int_{A_0}\int_{A_1}\ldots\int_{A_k}P^{n_k-n_{k-1}}(x_{k-1},dx_k)\ldots P^{n_1}(x_0,dx_1)\,\mu(dx_0)$$ where $P^0=I$ (identity) and $P^n=P P^{n-1}$ for $n\geq1$. Under this probability, $X$ is time homogeneous Markov chain with initial probability $\mu$ with transitions kernel $P$. In particular, when $\mu=\delta_{x}$ for some $x\in M$ we use the notation $\mathbb{P}_x=\mathbb{P}_{\delta_x}$. It is easy to check that for any probability measure $\mu$ on $(M,\mathscr{B})$ and any bounded measurable function $F:(\Omega,\mathscr{F})\rightarrow(\mathbb{R},\mathscr{B}(\mathbb{R}))$ \begin{align} \mathbb{E}_\mu[F]=\int_M\mathbb{E}_x[F]\,\mu(dx)\tag{0}\label{zero} \end{align}

Let $\theta:\Omega\rightarrow\Omega$ define the shift operator $(\theta(\omega))(n)=\omega(n+1)$. Then, for any bounded measurable function $F:(\Omega,\mathscr{B})\rightarrow(\mathbb{R},\mathscr{B}(\mathbb{R})$

$$\mathbb{E}_\mu[F(X\circ\theta^m)|\mathscr{F}_m]=\mathbb{E}_{X_m}[F(X)]$$

It is easy to check that $\mu P=\mu$ iff $\theta$ is $\mathbb{P}_\mu$ invariant ($\mathbb{P}_\mu[\theta^{-1}(B)]=\mathbb{P}_\mu[B]$ for all $B\in\mathscr{F}$) or equivalently, $\mu P=\mu$ iff $X$ is stationary w.r.t $\mathbb{P}_\mu$. The following result is what the OP seems to be looking for:

Theorem PW: Suppose $\mu P=\mu$, $\mu(M)=1$. For any $f\in L_1(\mu)$ there is $B_f\in\mathscr{B}$ such that $\mu(B_f)=1$, and a function $f^*\in L_1(\mu)$ such that for all $x\in B_f$ $$\frac1n\sum^{n-1}_{k=1}f(X_k)\xrightarrow{n\rightarrow\infty}f^*(X_0)\qquad \text{$\mathbb{P}_x$-a.s}$$ Moreover, $\int f^* \,d\mu=\int f\,d\mu$. If $\mu$ is $P$-ergodic, then for all $x\in B_f$, $f^*=\int f\,d\mu$ $\mathbb{P}_x$-a.s


Ergodic theorems:

Recall that a $P$ invariant measure $\mu$ is $P$ ergodic if for for any absorbent set $A$ ($P\mathbb{1}_A\geq\mathbb{1}_A$), $\mu(A)\in\{0,1\}$. In the particular case where $Pf=f\circ T$ for some $\mu$-invariant transformation $T$, $\mu$ is ergodic if $\mu(A)\in\{0,1\}$ for all $A\in\mathscr{B}$ with $T^{-1}(A)=A$.

It can be shown that if $\mu$ is $P$ ergodic, then for any $f\in L_1(\mu)$, $Pf=f$ $\mu$-a.s iff $f=\mu[f]:=\int f\,d\mu$ $\mu$-a.s.

We have state two ergodic theorems that act on different types of transformation. The first one is a direct consequence of the ergodic theorem of Dunford-Hopf-Schwartz and von Neumann for positive contractions:

Theorem DHS: Suppose $\mu P=\mu$, $\mu(M)=1$. For any $1\leq p<\infty$ and $f\in L_p(\mu)$ there is $Af\in L_p(\mu)$ such that $$A_nf=\frac{1}{n}\sum^{n-1}_{k=0}P^kf\xrightarrow{n\rightarrow\infty}Af$$ $\mu$-a.s. and in $L_p(\mu)$. Furthermore, $P(Af)=Af=A(Pf)$ $\mu$-a.s., and $\mu[Af]=\mu[f]$. If $\mu$ is ergodic, then $Af=\mu[f]$ $\mu$-a.s.

  • The $\mu$-a.s. convergence is known as individual ergodic theorem; convergence in $L_p$ is known as the mean ergodic theorem (von Neumann).
  • The limit $Af$ can be expressed in probabilistic terms. Let $\mathcal{I}^P_\mu$ the collection of sets in $\mathscr{B}$ such that $P\mathbb{1}_B=\mathbb{1}|_B$ $\mu$-a.s. It is possible to prove that $\mathcal{I}^p_\mu$ is a $\sigma$-algebra and that $Af=\mu[f|\mathcal{I}^P_\mu]$ $\mu$-a.s.(conditional expectation of $f$ given $\mathcal{I}^p_\mu$ under $\mu$).

On the other hand, we have the well known ergodic theorem of Birkoff and von Neumann

Theorem B: Suppose $(S,\mathscr{S},m)$ is a probability space and $T:(S,\mathscr{S})\rightarrow(S,\mathscr{S})$ is $\mu$ invariant ($m(T^{-1}(B))=m(B)$ for all $B\in\mathscr{S})$. For $1\leq p<\infty$ and $f\in L_p(m)$ there is $f^*\in L_p$ such that $$\frac{1}{n}\sum^{n-1}_{k=0}f\circ T^k\xrightarrow{n\rightarrow\infty} f^*$$ $m$-almsot surely and in $L_p(m)$. In particular, $$m[f^*]=m[f]$$ If $m$ is $T$ ergodic, $f^*=m[f]$ $m$-a.s.

  • As with Theorem DHS, $f^*$ can be expressed as a conditional expectation. Let $I_m=\{A\in\mathscr{S}: \mu(T^{-1}(A)\triangle A)=0$. this is a $\sigma$-alsgebra and $f^*=m(f|\mathcal{I}_\mu)$ $\mu$-a.s.

  • The Birkoff ergodic theorem can be obtained from the stronger DHS Theorem by cosidering the transition function $P_Tf:=f\circ T$ for all bonded measurable functions $f$ in $(S,\mathscr{S})$. This however might be an overkill. The technique of maximal inequalities can be used to prove both theorems (the individual parts); the $L_2$ result of von Neumann's theorem can be exploited to prove the mean ergodic results in borsht theorems.

We will only use explicitly Birkoff's ergodic theorem in the rest of this posting.


Proof of path wise ergodic theorem:

Applying Birkoff's ergodic theorem with $(S,\mathscr{S},m,T)=(\Omega,\mathscr{F},\mathbb{P}_\mu,\theta)$ we have that for any $G\in L_1(\mathbb{P}_\mu)$ there is $G^*\in L_1(\mathbb{P}_\mu)$ with $G^*\circ\theta=G^*$, $\mathbb{E}_\mu[G^*]=\mathbb{E}_\mu[G]$ and \begin{align}\frac1n\sum^{n-1}_{k=0}G\circ\theta^k\xrightarrow{n\rightarrow\infty}G^*\tag{1}\label{one}\end{align} $\mathbb{P}_\mu$-a.s. and in $L_1(\mathbb{P}_\mu)$.

Lemma: Suppose that $F\in\mathscr{F}_\infty$ is a bounded $\theta$--invariant function, that is $F=F\circ\theta$. If $\mathbb{P}_\mu$ is $\theta$ invariant, then $F=\mathbf{E}_\mu[F|\mathscr{F}_0]$ $\mathbb{P}_\mu$--a.s.

Proof of Lemma: Suppose $F\circ\theta=F$ and define $h(x)=\mathbb{E}_x[F]$. Then $$h(X_k)=\mathbb{E}_{X_k}[F]=\mathbb{E}_\mu[F\circ\theta^k|\mathscr{F}_k]=\mathbb{E}_\mu[F|\mathscr{F}_k]$$ It follows that $(h(X_k):k\in\mathbb{Z}_+)$ is a uniform integrable martingale with respect to the filtration $(\mathscr{F}_j:k\in\mathbb{Z}_+)$. An application of the martingale convergence theorem implies that $$h(X_k)\xrightarrow{k\rightarrow\infty} \mathbb{E}[F|\mathscr{F}_\infty]=F$$ $\mathbb{P}_\mu$-a.s. and in $L_1(\mathbb{P}_\mu)$. Using the stationarity of $\mathbb{P}_\mu$ again yields \begin{align} \|\mathbb{E}_{\mu}[F|\mathscr{F}_0]-F\|_{L_1(\mathbb{P}_\mu)}= \|(h(X_0) -F)\circ\theta^k\|_{L_1(\mathbb{P}_\mu)}= \|h(X_k)-F\|_{L_1(\mathbb{P}_\mu)}\xrightarrow{k\rightarrow\infty}0 \end{align} Therefore $\mathbb{E}[F|\mathcal{F}_0]=F$ $\mathbb{P}_\mu$-a.s. $ \blacksquare $

The Lemma above shows that $F^*=\mathbb{E}[F^*|\mathbf{F}_0]$ $\mathbb{P}_\mu$-a.s. and thus, there is a function $f^*\in L_1(\mu)$ such that $F^*(X)=f^*(X_0)$ $\mathbb{P}_\mu$-a.s.

Now, for $f\in L_1(\mu)$ and define consider $F_f(\omega):=f(\omega_0)$. Then, for some $f^*\in L_1(\mu)$, \begin{align}\frac1n\sum^{n-1}_{k=0}f(X_k)\xrightarrow{n\rightarrow\infty}f^*(X_0)\tag{2}\label{two}\end{align} $\mathbb{P}_\mu$-a.s. and in $L_1(\mathbb{P}_\mu)$ . Hence $$\int\mathbb{P}_x\Big[\{\frac1n\sum^{n-1}_{k=0}f(X_k)\xrightarrow{n\rightarrow\infty}f^*(X_0)\big\}\Big]\,\mu(dx)=1$$ It follows that there is $B_f\in\mathscr{B}$ such that $\mu(B_f)=1$, and for $x\in B_f$ \begin{align} \frac1n\sum^{n-1}_{k=0}f(X_k)\xrightarrow{n\rightarrow\infty}f^*(X_0) \qquad\text{$\mathbb{P}_x$-a.s.} \tag{3}\label{three} \end{align}


Ergodic case

When $\mu$ is $P$ ergodic, the limit function $f^*$ in the pathwise ergodic theorem described above is constant ($\mathbb{P}_x$-a.s. for all $x\in B_f$). This seems to be the result that the OP is mostly interested. This follows directly from the following result:

Theorem E: $\mu$ is $P$ ergodic iff $\theta$ is $\mathbb{P}_\mu$ ergodic.

Proof of Theorem E:

Necessity: Suppose $\mu$ is $P$ ergodic. For any bounded $\mathscr{F}$--measurable function $F$ define $h: x\mapsto \mathbb{E}_x[F]$. The Markov property implies that \begin{align} \mathbb{E}_x[F\circ\theta^k] =\mathbb{E}_x[\mathbb{E}_{X_k}[F]] =\mathbb{E}_x[h(X_k)]=P^kh(x) \end{align} If $F=F\circ\theta$ then $Ph=h$ and so, $h=\int f\,d\mu=\mathbb{E}_\mu[F]$ $\mu$-a.s. We will show that $F=\mathbb{E}_\mu[F]$ $\mathbb{P}_\mu$--a.s. Indeed, let $A\in\mathcal{F}_n$. Using the Markov property once more yields \begin{align} \mathbb{E}_\mu[F;A]&=\mathbb{E}_\mu[F\circ\theta^n;A]= \mathbb{E}_\mu[\mathbb{1}_A\mathbb{E}_\mu[F\circ\theta^n|\mathscr{F}_n]]\\ &= \mathbb{E}_\mu[\mathbb{E}_{X_n}[F];A]=\mathbb{E}_\mu[h(X_n);A]= \mathbb{E}_\mu[F]\mathbb{P}_\mu[A]; \end{align} where the last equality follows from $h=\mathbb{E}_\mu[F]$,,$\mu$--a.s. and the fact that the distribution of $X_n$ is $\mu$. Since $\mathscr{F}=\sigma(\cup_n\mathscr{F}_n)$ and $\cup_n\mathscr{F}_n$ is an algebra, it follows by monotone class arguments that $F=\mathbb{E}_\mu[F]$\ $\mathbb{P}_\mu$--a.s.

Sufficiency: Suppose that $\mathbb{P}_\mu$ is $\theta$--ergodic, and let $F$ be a bounded $\mathcal{F}$--measurable function. By Birkoff's ergodic theorem \begin{align} \frac1n\sum^{n-1}_{k=0}F\circ\theta^k \longrightarrow \mathbb{E}_\mu[F]\quad \text{$\mathbb{P}_\mu$--a.s. and in $L_1(\mathbb{P}_\mu)$} \end{align}

Consequently, there is $B\in\mathscr{B}$ with $\mu(B)=1$ such that for all $x\in B$, $\frac1n\sum^{n-1}_{k=0}F\circ\theta^k \longrightarrow \mathbb{E}_\mu[F]$ , $\mathbb{P}_x$--a.s. Thus, by dominated convergence,
\begin{align} \frac1n\sum^{n-1}_{k=0}\mathbb{E}_x[F\circ\theta^k] \longrightarrow \mathbb{E}_\mu[F], \qquad \text{$\mu$--a.s.}\tag{4}\label{four} \end{align} Let $A$ be a $P$--absorbent set and $F(\omega)=\mathbb{1}_A(\omega_0)$. Then, for any $n\in\mathbb{Z}_+$, $\mathbb{E}_x[F\circ\theta^n]=\mathbb{E}_x[\mathbb{1}_A(X_n)]=P^n\mathbb{1}_A(x)\geq\mathbb{1}_A(x)$; hence, $P^n\mathbb{1}_A=\mathbb{1}_A$ , $\mu$--a.s. and from \eqref{four}, $\mu[A]=\mathbb{1}_A$, $\mu$--a.s. Therefore, $\mu[A]\in\{0,1\}$. $ \blacksquare $