[Math] Infinite expected return time implies return probabilities tend to zero

markov chainsprobability theorystochastic-processes

For a Markov chain $X_n$ I want to prove that if
$$ \mathbb{E} (\min\{n \geq 1:X_n = i\}\mid X_0 = i) = \infty$$
(that is, if $i$ is transient or null recurrent), then
$$ p_{ii}^{(n)} \equiv \mathbb{P}(X_n = i \mid X_0 = i) \to 0 \quad \mathrm{as} \quad n \to \infty .$$
My intuition is that if the above expected value is infinite, then the probability that the first return to $i$ occurs at $n$ must decay relatively slowly with $n$, which means it's 'fairly likely' that the chain won't have returned to $i$ in the first $n$ steps. Hence the return probabilities are small. However, I can't turn this into a proof that they decay to zero. Please be as elementary as possible!


If $i$ is transient then the theorem is easy to prove. Transience is equivalent to the sum of the $p_{ii}^{(n)}$ being finite, which means they must tend to zero. The null recurrent case is proving more stubborn.

Best Answer

Here is one proof that I found in Section 1.8 of James Norris' Markov Chains. It's far more sophisticated than I thought would be necessary; if anybody has a more straightforward proof to offer it would be much appreciated.

Suppose $X_n$ is null recurrent. If it is periodic with period $d$, we can consider the aperiodic subchain $X_{dn}$ and prove that return probabilities for this chain tend to zero, thereby proving that all return probabilities tend to zero. Hence assume the chain is aperiodic.

Let $T_j$ be the first return time to state $j$. We know that $$ \infty = \sum_{k = 0}^\infty k \,\mathbb{P}_j(T_j = k) = \sum_{k = 0}^\infty \mathbb{P}_j(T_j > k) \,.$$ Given $\varepsilon > 0$ choose $K$ such that $$ \sum_{k=0}^K \mathbb{P}_j(T_j > k) \geq \frac{1}{\varepsilon} \,.$$ Then, for $n \geq K$, $$ \begin{align}1 &\geq \sum_{k =n-K}^n\mathbb{P}(X_k = j, X_m \neq j \;\text{for}\; m=k+1,\ldots,n) \\ &= \sum_{k =n-K}^n \mathbb{P}(X_k = j)\mathbb{P}_j(T_j > n - k) \\ &=\sum_{k =0}^K \mathbb{P}(X_{n-k} = j)\mathbb{P}_j(T_j > k)\,.\end{align}$$ Hence we must have $\mathbb{P}_j(X_{n-k} = j)\leq \varepsilon$ for some $k = 0,1,\ldots,K$.

Now we make a coupling argument. Let $Y_n$ be a chain with the same transition probabilities as $X_n$ and an initial distribution to be defined later. The chain $W_n = (X_n,Y_n)$ is irreducible by virtue of the aperiodicity of $X_n$. If $W_n$ is transient then by setting $X_n$ and $Y_n$ to have the same initial distribution we can easily show that return probabilities tend to zero.

So assume that $W_n$ is recurrent. But an irreducible recurrent chain explores all states with unit probability, so the two chains $X_n$ and $Y_n$ meet almost surely. Set $Z_n$ to equal $X_n$ before the two chains meet (at $n = T$, say) and $Y_n$ thereafter. Then $X_n$ and $Z_n$ have the same distribution so $$\begin{align}|\mathbb{P}(X_n = j) - \mathbb{P}(Y_n = j)| &= |\mathbb{P}(Z_n = j) - \mathbb{P}(Y_n = j)| \\ &=|\mathbb{P}(X_n = j, n < T) - \mathbb{P}(Y_n = j, n < T)| \\ & \leq \mathbb{P}(n < T) \to 0 \,. \end{align} $$ Hence the two chains $X_n$ and $Y_n$ have converging return probabilities. Now we just need to exploit the fact that we are free to choose the initial distribution for $Y_n$. If $X_n$ has initial distribution $\lambda$ let $Y_n$ have initial distribution $\lambda P^k$ for $k = 1,\ldots,K$, so that $\mathbb{P}(Y_n = j) = \mathbb{P}(X_{n+k} = j)$. Since the two chains have convergent probabilities, we can find $N$ such that for $n \geq N$ and $k=1,\ldots,K$, $$ | \mathbb{P}(X_n = j) - \mathbb{P}(X_{n+k} = j)| \leq \varepsilon \,.$$ Finally, we know from before that in any interval of length $K$ there is some $k$ such that $\mathbb{P}(X_k = j) \leq \varepsilon$, and hence $$ \mathbb{P}(X_n = j) \leq 2 \varepsilon \,.$$ Since $\varepsilon > 0$ was arbitrary we have proved the result.