probably you already know this example, but in any case...
let $\Omega=\{0,1\}^{\mathbb{N}}$, $\mathcal{F}$ the $\sigma$-algebra generated by the cylinder sets of $\{0,1\}^{\mathbb{N}}$ and
$
\mathbb{P}=\prod_{i\in\mathbb{N}} \mu_i
$
is product measure with $\mu_i:\mathcal{P}(\{0,1\})\to [0,1]$ given by $\mu_i(\{0\})=\frac{1}{2}=\mu_i(\{1\})$ for all $i\in\mathbb{N}$.
We define $\varphi:\Omega\to\Omega$ in the following way:
We think about an element $\omega\in\Omega$ as an infinite sequence of zeros and one, that is, $\omega=(\omega_1,\omega_2,\ldots)$ and then put
$$
\varphi(\omega_1,\omega_2,\ldots)=(\omega_2,\omega_3,\ldots)
$$
this function is known as the left-shift.
You can prove that $\varphi$ is a measure-preserving map. The trick is prove that $\mathbb{P}(\varphi^{-1}\mathscr{C})=\mathbb{P}(\mathscr{C})$ for any cylinder set and then extend this to the whole $\sigma$-algebra using for example the extension measure theorems.
Aside comment: We can prove a more strong fact, this product measure is in fact ergodic for the shift, which implies that the invariant sets has measure zero or one.
To finish the example take $A=\Omega\setminus\{1,1,1,\ldots\}$ which is a measure one set. Note that $\varphi^{-1}A=\Omega$, once for any $\omega\in\Omega$ we have $\mathbb{P}(\{\omega\})=0$ (the proof of this statement is a consequence of the continuity of $\mathbb{P}$), follows that $\mathbb{P}(\varphi^{-1}A\Delta A)=0$.
Here we establish a general ergodic pathwise ergodic theorem and also consider the ergodic case in which the OP seems to be interested.
Concepts and definitions:
Suppose $(M,\mathscr{B},\mu)$ is a probability space ($M$ is a Polish space with a Borel $\sigma$-algebra for example). Let $\Omega=M^{\mathbb{Z}_+}$ equipped with the product $\sigma$-algebra $\mathscr{F}=\mathscr{B}^{\otimes\mathbb{Z}_+}$. For each $n\in\mathbb{Z}_+$ let $X_n:\Omega\rightarrow M$ be the projection $X_n(\omega)=\omega(n)$, define $\mathscr{F}_n:=\sigma(X_k:0\leq k\leq n)$. It is obvious that $(\mathscr{F}_n:n\in\mathbb{Z}_+)$ is a filtration and that the process $X:\omega\mapsto\omega$ us adapted to this filtration. There is (an applications if Ionescu-Tulcea's theorem for example) a unique probability measure $\mathbb{P}_\mu$ on $(\Omega,\mathscr{F})$ such that for any $A_0,\ldots, A_k\in\mathscr{B}$ and integers $0=n_0< n_1<\ldots <n_k$
$$\mathbb{P}_\mu[X_{n_j}\in A_j]=\int_{A_0}\int_{A_1}\ldots\int_{A_k}P^{n_k-n_{k-1}}(x_{k-1},dx_k)\ldots P^{n_1}(x_0,dx_1)\,\mu(dx_0)$$
where $P^0=I$ (identity) and $P^n=P P^{n-1}$ for $n\geq1$.
Under this probability, $X$ is time homogeneous Markov chain with initial probability $\mu$ with transitions kernel $P$. In particular, when $\mu=\delta_{x}$ for some $x\in M$ we use the notation $\mathbb{P}_x=\mathbb{P}_{\delta_x}$. It is easy to check that for any probability measure $\mu$ on $(M,\mathscr{B})$ and any bounded measurable function $F:(\Omega,\mathscr{F})\rightarrow(\mathbb{R},\mathscr{B}(\mathbb{R}))$
\begin{align}
\mathbb{E}_\mu[F]=\int_M\mathbb{E}_x[F]\,\mu(dx)\tag{0}\label{zero}
\end{align}
Let $\theta:\Omega\rightarrow\Omega$ define the shift operator $(\theta(\omega))(n)=\omega(n+1)$. Then, for any bounded measurable function $F:(\Omega,\mathscr{B})\rightarrow(\mathbb{R},\mathscr{B}(\mathbb{R})$
$$\mathbb{E}_\mu[F(X\circ\theta^m)|\mathscr{F}_m]=\mathbb{E}_{X_m}[F(X)]$$
It is easy to check that $\mu P=\mu$ iff $\theta$ is $\mathbb{P}_\mu$ invariant ($\mathbb{P}_\mu[\theta^{-1}(B)]=\mathbb{P}_\mu[B]$ for all $B\in\mathscr{F}$) or equivalently, $\mu P=\mu$ iff $X$ is stationary w.r.t $\mathbb{P}_\mu$. The following result is what the OP seems to be looking for:
Theorem PW: Suppose $\mu P=\mu$, $\mu(M)=1$. For any $f\in L_1(\mu)$ there is $B_f\in\mathscr{B}$ such that $\mu(B_f)=1$, and a function $f^*\in L_1(\mu)$ such that for all $x\in B_f$
$$\frac1n\sum^{n-1}_{k=1}f(X_k)\xrightarrow{n\rightarrow\infty}f^*(X_0)\qquad \text{$\mathbb{P}_x$-a.s}$$
Moreover, $\int f^* \,d\mu=\int f\,d\mu$. If $\mu$ is $P$-ergodic, then for all $x\in B_f$, $f^*=\int f\,d\mu$ $\mathbb{P}_x$-a.s
Ergodic theorems:
Recall that a $P$ invariant measure $\mu$ is $P$ ergodic if for for any absorbent set $A$ ($P\mathbb{1}_A\geq\mathbb{1}_A$), $\mu(A)\in\{0,1\}$.
In the particular case where $Pf=f\circ T$ for some $\mu$-invariant transformation $T$, $\mu$ is ergodic if $\mu(A)\in\{0,1\}$ for all $A\in\mathscr{B}$ with $T^{-1}(A)=A$.
It can be shown that if $\mu$ is $P$ ergodic, then for any $f\in L_1(\mu)$, $Pf=f$ $\mu$-a.s iff $f=\mu[f]:=\int f\,d\mu$ $\mu$-a.s.
We have state two ergodic theorems that act on different types of transformation. The first one is a direct consequence of the ergodic theorem of Dunford-Hopf-Schwartz and von Neumann for positive contractions:
Theorem DHS: Suppose $\mu P=\mu$, $\mu(M)=1$. For any $1\leq p<\infty$ and $f\in L_p(\mu)$ there is $Af\in L_p(\mu)$ such that
$$A_nf=\frac{1}{n}\sum^{n-1}_{k=0}P^kf\xrightarrow{n\rightarrow\infty}Af$$
$\mu$-a.s. and in $L_p(\mu)$. Furthermore, $P(Af)=Af=A(Pf)$ $\mu$-a.s., and $\mu[Af]=\mu[f]$. If $\mu$ is ergodic, then $Af=\mu[f]$ $\mu$-a.s.
- The $\mu$-a.s. convergence is known as individual ergodic theorem; convergence in $L_p$ is known as the mean ergodic theorem (von Neumann).
- The limit $Af$ can be expressed in probabilistic terms. Let $\mathcal{I}^P_\mu$ the collection of sets in $\mathscr{B}$ such that
$P\mathbb{1}_B=\mathbb{1}|_B$ $\mu$-a.s. It is possible to prove that $\mathcal{I}^p_\mu$ is a $\sigma$-algebra and that $Af=\mu[f|\mathcal{I}^P_\mu]$ $\mu$-a.s.(conditional expectation of $f$ given $\mathcal{I}^p_\mu$ under $\mu$).
On the other hand, we have the well known ergodic theorem of Birkoff and von Neumann
Theorem B: Suppose $(S,\mathscr{S},m)$ is a probability space and $T:(S,\mathscr{S})\rightarrow(S,\mathscr{S})$ is $\mu$ invariant ($m(T^{-1}(B))=m(B)$ for all $B\in\mathscr{S})$. For $1\leq p<\infty$ and $f\in L_p(m)$ there is $f^*\in L_p$ such that
$$\frac{1}{n}\sum^{n-1}_{k=0}f\circ T^k\xrightarrow{n\rightarrow\infty} f^*$$
$m$-almsot surely and in $L_p(m)$. In particular, $$m[f^*]=m[f]$$ If $m$ is $T$ ergodic, $f^*=m[f]$ $m$-a.s.
As with Theorem DHS, $f^*$ can be expressed as a conditional expectation. Let $I_m=\{A\in\mathscr{S}: \mu(T^{-1}(A)\triangle A)=0$. this is a $\sigma$-alsgebra and $f^*=m(f|\mathcal{I}_\mu)$ $\mu$-a.s.
The Birkoff ergodic theorem can be obtained from the stronger DHS Theorem by cosidering the transition function $P_Tf:=f\circ T$ for all bonded measurable functions $f$ in $(S,\mathscr{S})$. This however might be an overkill. The technique of maximal inequalities can be used to prove both theorems (the individual parts); the $L_2$ result of von Neumann's theorem can be exploited to prove the mean ergodic results in borsht theorems.
We will only use explicitly Birkoff's ergodic theorem in the rest of this posting.
Proof of path wise ergodic theorem:
Applying Birkoff's ergodic theorem with $(S,\mathscr{S},m,T)=(\Omega,\mathscr{F},\mathbb{P}_\mu,\theta)$ we have that for any $G\in L_1(\mathbb{P}_\mu)$ there is $G^*\in L_1(\mathbb{P}_\mu)$ with $G^*\circ\theta=G^*$, $\mathbb{E}_\mu[G^*]=\mathbb{E}_\mu[G]$ and
\begin{align}\frac1n\sum^{n-1}_{k=0}G\circ\theta^k\xrightarrow{n\rightarrow\infty}G^*\tag{1}\label{one}\end{align}
$\mathbb{P}_\mu$-a.s. and in $L_1(\mathbb{P}_\mu)$.
Lemma: Suppose that $F\in\mathscr{F}_\infty$ is a bounded $\theta$--invariant function, that is $F=F\circ\theta$. If $\mathbb{P}_\mu$ is $\theta$ invariant, then $F=\mathbf{E}_\mu[F|\mathscr{F}_0]$ $\mathbb{P}_\mu$--a.s.
Proof of Lemma: Suppose
$F\circ\theta=F$ and define $h(x)=\mathbb{E}_x[F]$. Then
$$h(X_k)=\mathbb{E}_{X_k}[F]=\mathbb{E}_\mu[F\circ\theta^k|\mathscr{F}_k]=\mathbb{E}_\mu[F|\mathscr{F}_k]$$
It follows that $(h(X_k):k\in\mathbb{Z}_+)$ is a uniform integrable martingale with respect to the filtration $(\mathscr{F}_j:k\in\mathbb{Z}_+)$. An application of the martingale convergence theorem implies that
$$h(X_k)\xrightarrow{k\rightarrow\infty} \mathbb{E}[F|\mathscr{F}_\infty]=F$$
$\mathbb{P}_\mu$-a.s. and in $L_1(\mathbb{P}_\mu)$. Using the stationarity of $\mathbb{P}_\mu$ again yields
\begin{align}
\|\mathbb{E}_{\mu}[F|\mathscr{F}_0]-F\|_{L_1(\mathbb{P}_\mu)}=
\|(h(X_0) -F)\circ\theta^k\|_{L_1(\mathbb{P}_\mu)}=
\|h(X_k)-F\|_{L_1(\mathbb{P}_\mu)}\xrightarrow{k\rightarrow\infty}0
\end{align}
Therefore $\mathbb{E}[F|\mathcal{F}_0]=F$ $\mathbb{P}_\mu$-a.s. $ \blacksquare $
The Lemma above shows that $F^*=\mathbb{E}[F^*|\mathbf{F}_0]$ $\mathbb{P}_\mu$-a.s. and thus, there is a function $f^*\in L_1(\mu)$ such that $F^*(X)=f^*(X_0)$ $\mathbb{P}_\mu$-a.s.
Now, for $f\in L_1(\mu)$ and define consider $F_f(\omega):=f(\omega_0)$. Then, for some $f^*\in L_1(\mu)$,
\begin{align}\frac1n\sum^{n-1}_{k=0}f(X_k)\xrightarrow{n\rightarrow\infty}f^*(X_0)\tag{2}\label{two}\end{align}
$\mathbb{P}_\mu$-a.s. and in $L_1(\mathbb{P}_\mu)$ . Hence
$$\int\mathbb{P}_x\Big[\{\frac1n\sum^{n-1}_{k=0}f(X_k)\xrightarrow{n\rightarrow\infty}f^*(X_0)\big\}\Big]\,\mu(dx)=1$$
It follows that there is $B_f\in\mathscr{B}$ such that
$\mu(B_f)=1$, and for $x\in B_f$
\begin{align}
\frac1n\sum^{n-1}_{k=0}f(X_k)\xrightarrow{n\rightarrow\infty}f^*(X_0) \qquad\text{$\mathbb{P}_x$-a.s.} \tag{3}\label{three}
\end{align}
Ergodic case
When $\mu$ is $P$ ergodic, the limit function $f^*$ in the pathwise ergodic theorem described above is constant ($\mathbb{P}_x$-a.s. for all $x\in B_f$). This seems to be the result that the OP is mostly interested. This follows directly from the following result:
Theorem E: $\mu$ is $P$ ergodic iff $\theta$ is $\mathbb{P}_\mu$ ergodic.
Proof of Theorem E:
Necessity: Suppose $\mu$ is $P$ ergodic. For any bounded $\mathscr{F}$--measurable
function $F$ define $h: x\mapsto \mathbb{E}_x[F]$. The Markov property implies that
\begin{align}
\mathbb{E}_x[F\circ\theta^k] =\mathbb{E}_x[\mathbb{E}_{X_k}[F]] =\mathbb{E}_x[h(X_k)]=P^kh(x)
\end{align}
If $F=F\circ\theta$ then $Ph=h$ and so, $h=\int f\,d\mu=\mathbb{E}_\mu[F]$ $\mu$-a.s.
We will show that $F=\mathbb{E}_\mu[F]$ $\mathbb{P}_\mu$--a.s. Indeed,
let $A\in\mathcal{F}_n$. Using the Markov property once more yields
\begin{align}
\mathbb{E}_\mu[F;A]&=\mathbb{E}_\mu[F\circ\theta^n;A]=
\mathbb{E}_\mu[\mathbb{1}_A\mathbb{E}_\mu[F\circ\theta^n|\mathscr{F}_n]]\\
&=
\mathbb{E}_\mu[\mathbb{E}_{X_n}[F];A]=\mathbb{E}_\mu[h(X_n);A]=
\mathbb{E}_\mu[F]\mathbb{P}_\mu[A];
\end{align}
where the last equality follows from
$h=\mathbb{E}_\mu[F]$,,$\mu$--a.s. and the fact that the distribution of $X_n$ is $\mu$. Since $\mathscr{F}=\sigma(\cup_n\mathscr{F}_n)$ and $\cup_n\mathscr{F}_n$ is an algebra, it follows by monotone class arguments that $F=\mathbb{E}_\mu[F]$\ $\mathbb{P}_\mu$--a.s.
Sufficiency: Suppose that $\mathbb{P}_\mu$ is $\theta$--ergodic, and let $F$ be a bounded $\mathcal{F}$--measurable function. By Birkoff's ergodic theorem
\begin{align}
\frac1n\sum^{n-1}_{k=0}F\circ\theta^k \longrightarrow \mathbb{E}_\mu[F]\quad
\text{$\mathbb{P}_\mu$--a.s. and in $L_1(\mathbb{P}_\mu)$}
\end{align}
Consequently, there is $B\in\mathscr{B}$ with $\mu(B)=1$ such that
for all $x\in B$,
$\frac1n\sum^{n-1}_{k=0}F\circ\theta^k \longrightarrow \mathbb{E}_\mu[F]$
, $\mathbb{P}_x$--a.s. Thus, by dominated convergence,
\begin{align}
\frac1n\sum^{n-1}_{k=0}\mathbb{E}_x[F\circ\theta^k]
\longrightarrow \mathbb{E}_\mu[F],
\qquad \text{$\mu$--a.s.}\tag{4}\label{four}
\end{align}
Let $A$ be a $P$--absorbent set and
$F(\omega)=\mathbb{1}_A(\omega_0)$. Then, for any
$n\in\mathbb{Z}_+$, $\mathbb{E}_x[F\circ\theta^n]=\mathbb{E}_x[\mathbb{1}_A(X_n)]=P^n\mathbb{1}_A(x)\geq\mathbb{1}_A(x)$;
hence, $P^n\mathbb{1}_A=\mathbb{1}_A$ ,
$\mu$--a.s. and from \eqref{four},
$\mu[A]=\mathbb{1}_A$, $\mu$--a.s. Therefore, $\mu[A]\in\{0,1\}$.
$ \blacksquare $
Best Answer
This result is provided, without proof, as Theorem 5.6(e) of A First Course in Stochastic Processes, 2nd ed., by Karlin and Taylor (1975).
I don't have this book, but Stationary and Related Stochastic Processes: Sample Function Properties and Their Applications by Cramer and Leadbetter (2004) might have some information there. I will edit this post if I find out whether or not this result is mentioned in there.