Birkhoff’s Ergodic Theorem for Integrable vs Bounded Functions

It seems to me that a considerably simpler proof [see below] of Birkhoff's ergodic theorem can be obtained for bounded observables than for more general $L^1$ observables. Therefore, I feel like it would be nicer to prove Birkhoff's ergodic theorem by starting off with the bounded case and then extending to the general case, than by "directly" tackling the general case. But can this extension from $L^\infty$ to $L^1$ be done in any "straightforward" elementary manner?

Let me give one possible version of how to make this question precise:

Let $(X,\mathcal{X},\mu)$ be a probability space. A Markov operator on $L^1(\mu)$ is a linear, monotone, unity-preserving, integral-preserving function $P \colon L^1(\mu) \to L^1(\mu)$.

Suppose we have a sequence $(P_n)_{n \geq 1}$ of Markov operators on $L^1(\mu)$ such that (1) for every $f \in L^\infty(\mu)$ the following statements hold:

$P_n(f) \overset{\mu\textrm{-a.s.}}{\to} \int_X f \, d\mu\ $ as $n \to \infty$;
for all $m,n \geq 1$, $\|nP_n(f) – mP_m(f)\|_{L^\infty(\mu)} \leq |m-n|\|f\|_{L^\infty(\mu)}$;

and (2) for every $f \in L^1(\mu)$ the following statements hold:

there exists a value $P_\infty[f] \in \overline{\mathbb{R}}$ such that $\,\liminf_{n \to \infty} P_n(f) \overset{\mu\textrm{-a.s.}}{=} P_\infty[f]$;
for all $m,n \geq 1$, $\|nP_n(f) – mP_m(f)\|_{L^1(\mu)} \leq |m-n|\|f\|_{L^1(\mu)}$.

(Obviously, if $f \in L^\infty(\mu)$ then $P_\infty[f]=\int_X f \, d\mu$. We will also see from the proof of the result in the Note below that the same holds for all $f \in L^1(\mu)$ with $f \geq 0$.)

Do we necessarily have that for all $f \in L^1(\mu)$, $P_n(f) \overset{\mu\textrm{-a.s.}}{\to} \int_X f \, d\mu\ $ as $n \to \infty$?

(Since $P_n(-f)=-P_n(f)$, this is equivalent to saying that for all $f \in L^1(\mu)$, $P_\infty[f]=\int_X f \, d\mu$.)

Remark. I expect that if the answer is yes, then not all the conditions given above will be necessary to prove it.

Note. As below, it is not hard to show [under considerably weaker conditions than those given above] that for each $f \in L^1(\mu)$, $\ P_n(f) \overset{L^1(\mu)}{\to} \int_X f \, d\mu\ $ as $n \to \infty$.

Proof: Without loss of generality take $f \overset{\mu\textrm{-a.s.}}{\geq} 0$. For each $k>0$ we have $P_n(f \wedge k) \overset{\mu\textrm{-a.s.}}{\to} \int_X f \wedge k \; d\mu\ $ as $n \to \infty$. Hence, by monotonicity of Markov operators it is clear that
$$ \liminf_{n \to \infty} P_n(f) \overset{\mu\textrm{-a.s.}}{\geq} \int_X f \, d\mu. $$
But due to the integral-preservation of Markov operators, we also have that $\int_X P_n(f) \, d\mu = \int_X f \, d\mu$ for all $n$; and so, since $P_n(f) \overset{\mu\textrm{-a.s.}}{\geq} 0$ (by monotonicity of Markov operators), Fatou's lemma gives that
$$ \int_X \liminf_{n \to \infty} P_n(f) \, d\mu \leq \int_X f \, d\mu. $$
Hence it follows that
$$ \liminf_{n \to \infty} P_n(f) \overset{\mu\textrm{-a.s.}}{=} \int_X f \, d\mu. $$
The result is then an immediate consequence of the following Scheffé-like lemma.

Lemma. Given a sequence $(Y_n)_{n \in \mathbb{N}}$ of integrable random variables $Y_n$ that is uniformly bounded below, if $\ Y \!:=\! \underset{n \to \infty}{\liminf} Y_n\ $ is integrable and $\mathbb{E}[Y_n] \to \mathbb{E}[Y]$ as $n \to \infty$, then $Y_n \overset{L^1}{\to} Y$.

Proof of Lemma. Without loss of generality take $Y=0$. We have that $Y_n \wedge 0$ converges pointwise to $0$ as $n \to \infty$, and so since $(Y_n)$ is uniformly bounded below, the dominated convergence theorem can be applied to give that $\mathbb{E}[Y_n \wedge 0] \to 0$ as $n \to \infty$. Since $|Y_n|=Y_n-2(Y_n \wedge 0)$ and $\mathbb{E}[Y_n] \to 0$ as $n \to \infty$, it follows that $\mathbb{E}[|Y_n|] \to 0$ as $n \to \infty$. So we are done.

Short proof of BET for bounded observables.

I constructed the following proof after doing a bit of Googling of proofs of the ergodic theorem. (In particular, it combines ideas from the paper https://doi.org/10.1214/074921706000000266 of Keane and Peterson – which, to be fair, is already a pretty impressively short proof even for general $L^1$ observables – with proofs I've seen in online lectures notes on ergodic theory; but, of course, I also use boundedness to help get a short proof.)

I will prove the theorem for ergodic transformations, but it is not at all difficult to extend (in the appropriate manner) to more general measure-preserving transformations. I have tried to write out the proof sufficiently explicitly to be smoothly readable in about 2-5 minutes.

Theorem. Let $(X,\mathcal{X},T,\mu)$ be an ergodic measure-preserving dynamical system and let $f \colon X \to \mathbb{R}$ be a bounded measurable function. Then $P_N(f):=\frac{1}{N} \sum_{i=0}^{N-1} f \circ T^i \, \overset{\mu\textrm{-a.s.}}{\to} \, \mathbb{E}_\mu[f]\,$ as $N \to \infty$.

Proof. Since $\overline{\lim}_{N \to \infty} P_N(f)$ is $T$-invariant, it is $\mu$-a.e. equal to a constant $\overline{f}$. We will show $\overline{f} \leq \mathbb{E}_\mu[f]$; then, applying this to $-f$ as well as $f$ gives the result. Fixing arbitrary $\varepsilon>0$, let $g=f+\varepsilon-\overline{f}$. Define the monotone sequences
\begin{align*}
m_N^{[g]} := & \, \max\big\{ \, g \ , \ g \!+\! (g \circ T) \ , \ \ldots\ldots \ , \ g \!+\! (g \circ T) \!+\! \ldots \!+\! (g \circ T^{N-1}) \, \big\} \\
E_N^{[g]} := & \, \{ x \in X : m_N^{[g]}(x)>0 \}.
\end{align*}
For $N \geq 2$,
\begin{align*}
(m_N^{[g]} – g)(x) &= \max\big\{ \, 0 \ , \ g(T(x)) \ , \ \ldots\ldots \ , \ g(T(x)) \!+\! \ldots \!+\! g(T^{N-1}(x)) \, \big\} \\
&= (m_{N-1}^{[g]})^+(T(x)).
\end{align*}
Integrating over $E_N^{[g]}$ gives
\begin{align*}
\int_{E_N^{[g]}} g \, d\mu \ &= \ \int_{E_N^{[g]}} m_N^{[g]} \, d\mu – \int_{E_N^{[g]}} (m_{N-1}^{[g]})^+ \circ T \, d\mu \\
&= \int_X (m_N^{[g]})^+ \, d\mu – \int_{E_N^{[g]}} (m_{N-1}^{[g]})^+ \circ T \, d\mu \\
&\geq \int_X (m_N^{[g]})^+ \, d\mu – \int_X (m_N^{[g]})^+ \circ T \, d\mu \ = \ 0.
\end{align*}
Now
\begin{align*}
\bigcup_{N=1}^\infty \! E_N^{[g]} &= \{x \, : \, \exists n \in \mathbb{N} \text{ s.t. } P_n(g)(x) > 0 \} \\
&= \{x \, : \, \exists n \in \mathbb{N} \text{ s.t. } P_n(f)(x) > \overline{f} – \varepsilon \},
\end{align*}
which is a $\mu$-full measure set by definition of $\overline{f}$. Hence by the dominated convergence theorem, $\int_X g \, d\mu \geq 0$, i.e. $\mathbb{E}_\mu[f] \geq \overline{f}-\varepsilon$. But $\varepsilon$ was arbitrary. QED.

Remark. Contained within the above is a proof of a "maximal ergodic theorem" that does not rely on $g$ being bounded but only on $g$ being $\mu$-integrable; in other words, the above proof goes through without modification for any $f \in L^1(\mu)$ for which it is known that $\overline{f}$ is finite. But I see no trivial way of showing that $\overline{f}$ is finite except when $f$ is bounded.

Best Answer

There is a simple reduction of the Birkhoff ergodic Theorem for $L^1$ functions to the bounded case using Kakutani-Rokhlin towers, that I learned from H. Furstenberg and B. Weiss decades ago. We use the notation of the post, and assume that $T$ is ergodic. Given $f \in L^1(X)$ we may assume it is nonnegative, and then (by subtracting the fractional part and adding 1) that it takes values in the positive integers. Consider the subset $$Y=\{(x,k): 1 \le k \le f(x) \}$$ of $X \times {\mathbb N}$, endowed with the product $\sigma$-algebra, the probability measure $\nu$ determined by $$\nu(A \times{k})=\frac{\mu(A)}{\int_X f \, d\mu}$$ for measurable sets $A \subset X$ where $\min_A f \ge k$, and the transformation $S$ defined by $S(x,k)=(x,k+1)$ if $k<f(x)$ and $S(x,k)=(T(x),1)$ if $k=f(x)$.

Then $S$ is ergodic on $(Y,\nu)$, since if a measurable set $E \subset Y$ is invariant under $S$, then $E_1:=\{x \in X \,: (x,1) \in E\}$ is invariant under $T$.

Claim: The ergodic theorem in $(Y,\nu,S)$ for the indicator function $$h(x,k):={\mathbf 1}_{\{k=1\}}$$ (along the sequence of return times to the base $\{k=1\}$ of the tower) implies the ergodic theorem for $f$ in $(X,\mu,T)$.

Proof: consider the Birkhoff sums $$R_n(x):=\sum_{j=0}^{n-1}f(T^jx) \,.$$ For each $x \in X$, we have $$\{\ell \ge 0 : h(S^\ell(x,1))=1 \}= \{R_j(x) \; : \: j \ge 0\} \,$$ whence we conclude that for $\mu$-almost every $x \in X$, $$\lim_{n \to \infty} \frac{n}{R_n(x)}= \lim_{n \to \infty} \frac{ \sum_{\ell=0}^{R_n(x)-1}h(S^\ell(x,a)) }{R_n(x)} =\int_Y h \, d\nu =\frac{\mu(X)}{\int_X f \, d\mu} =\frac{1}{\int_X f \, d\mu}\,.$$

Best Answer

Related Solutions

[Math] Doubts on Reproducing Kernel Hilbert Spaces and orthogonal decomposition

[Math] Birkhoff Ergodic Theorem and Ergodic Decomposition Theorem for Continuous-Time Markov Processes

Related Question