Probability – Necessary and Sufficient Condition for Markov Chain Sample Average Convergence

markov chainsprobabilitystochastic-processes

The ergodic theorem says that for an irreducible and positive-recurrent Markov chain $P$, any distribution $\lambda$, and $x_n (n\geq0) \sim Markov(\lambda, P)$ then it follows that for any bounded function $f:I\rightarrow R$
$$P(\frac{1}{n}\sum^{n-1}_{k=0}f(x_k)\rightarrow \bar{f} \text{as } n\rightarrow\infty)=1$$
where $\bar{f}=\sum_{i\in I}\pi_if_i $ and $\pi$ is the unique stationary distribution
(ref: ergodic theorem).

It seems being irreducible and positive-recurrent is a sufficient but not necessary condition here. As I can think of a reducible and transient Markov chain that has the same property (please correct if I'm wrong)
$$P=\left[\begin{matrix}1&0\\1&0\end{matrix}\right]$$
where the unique stationary distribution is $\pi=[1,0]$.

So what is the necessary and sufficient condition for such property to hold?


It seems the only case that this property will fail is when the limit of the average probability is not equal to the stationary distribution
$$\lim_{n\rightarrow\infty}\frac{1}{n}\sum^{n-1}_{k=0}p(x_k)\neq \pi.$$
The only case I can think of is when there're more than one closed classes, so is the necessary and sufficient condition having one closed and positive recurrent communicating class (basically go from irreducible to one closed class compared to the original sufficient condition)?

Best Answer

True for finite state systems

If the chain has a finite state space $S$, and if there is only one closed communicating class, then there is a unique stationary distribution (satisfying $\pi = \pi P$) and for each $i \in S$, the fraction of time being in state $i$ converges to $\pi_i$ with probability $1$, regardless of the initial state. This is because the Markov chain bounces around until it eventually hits a state in the closed communicating class, then behavior is as if it were an irreducible chain on that reduced state space.

If there are two or more closed communicating classes then there are multiple stationary distributions and time average behavior depends on which communication class the initial state is in.

Counter-example for countably infinite state systems

This is not necessarily true for countably infinite state spaces, even if there exists a stationary distribution $\pi$. Consider the example state space $S = \{0, 1, 2, 3, ...\}$ with transitions: \begin{align} P_{00} &= 1\\ P_{i,i+1} &= 1-2^{-i} \quad \forall i \in \{1, 2, 3, ...\} \\ P_{i,0} &= 2^{-i} \quad \forall i \in \{1, 2, 3, ...\} \end{align} There is a single closed communicating class, consisting of the single state $\{0\}$ (all other states are transient). There is a single stationary distribution $\pi$ that solves $\pi = \pi P$, namely,
$$ \pi = (\pi_0, \pi_1, \pi_2, \pi_3, ...) = (1, 0, 0, 0, ...) $$ Further, it is possible to get to state 0 from any other state. However, if we start in state $1$, the probability that we never visit state 0 is $$ \Pi_{i=1}^{\infty}(1-2^{-i}) > 0 $$ and so the fraction of time that we are in state $0$ does not converge to $\pi_0=1$ with probability 1.