Markov chain convergence – coupling proof: Why $\left|\mathbb{P}(Z_n=j) – \mathbb{P}(Y_n=j)\right| \leq \mathbb{P}(Z_n \neq Y_n)$

markov chainsprobability

I am trying to understand a proof of the following "convergence to equilibrium" theorem for Markov chains:

Suppose $P$ is irreducible and aperiodic, with stationary distribution $\pi$. If $(X_n, n\geq0)$ is a Markov chain with transition matrix $P$ and any initial distribution, then for all $j \in I$, $$\mathbb{P}(X_n = j) \to \pi_j \quad \text{as } n \to \infty.$$

However, I don't understand the step labelled \eqref{ref}.


Here is the relevant part of the proof:

Let $\lambda$ be any initial distribution, and let $X = (X_n, n\geq0)$ be Markov$(\lambda,P)$. The aim is to show that $\mathbb{P}(X_n=j) \to \pi_j$ as $n \to \infty$, for any $j$.

Consider another chain $Y=(Y_n,n \geq 0)$ which is Markov$(\pi,P)$, and which is independent of $X$. Since $\pi$ is stationary, $Y_n$ has distribution $\pi$ for all $n$.

Let $T = \inf\{n\geq0 : X_n = Y_n\}$. Assume (for now) that $\mathbb{P}(T<\infty)=1$; that is, the chains $X$ and $Y$ will meet at some point.

Then define another chain $Z$ by
$$
Z_n = \begin{cases}
X_n & \text{if } n < T \\
Y_n & \text{if } n \geq T
\end{cases}
$$

Then $Z$ is also Markov$(\lambda,P)$, since $Z_n$ starts in distribution $\lambda$ and each jump is done according to $P$, first by copying $X$ up to time $T$, and then by copying $Y$ after time $T$.
We have that
\begin{align}
\left|\mathbb{P}(Z_n=j) – \pi_j\right|
&= \left|\mathbb{P}(Z_n=j) – \mathbb{P}(Y_n=j)\right| \\
&\leq \mathbb{P}(Z_n \neq Y_n) \tag{$\ast$} \label{ref}\\
&= \mathbb{P}(T>n) \\
&\to 0 \quad \text{as } n \to \infty
\end{align}

So we have $\mathbb{P}(Z_n=j) \to \pi_j$. But the chains $X$ and $Z$ have the same distribution, so we have $\mathbb{P}(X_n=j) \to \pi_j$ as required.


Why is $\left|\mathbb{P}(Z_n=j) – \mathbb{P}(Y_n=j)\right| \leq \mathbb{P}(Z_n \neq Y_n)$ true?

Best Answer

For any subset $A$ of the state space, we have $$\begin{align}P(Z_n \in A) - P(Y_n \in A) &= [P(Z_n \in A, Z_n = Y_n) + P(Z_n \in A, Z_n \ne Y_n)] \\ &\qquad - [P(Y_n \in A, Z_n = Y_n) + P(Y_n \in A, Z_n \ne Y_n)] \\ &= P(Z_n \in A, Z_n \ne Y_n) - P(Y_n \in A, Z_n \ne Y_n) \\ &\le P(Z_n \ne Y_n). \end{align}$$ Similarly you can show $$P(Y_n \in A) - P(Z_n \in A) \le P(Z_n \ne Y_n).$$ Together this yields $$|P(Z_n \in A) - P(Y_n \in A)| \le P(Z_n \ne Y_n).$$ Your inequality is the special case where $A=\{j\}$.


Note that we have actually proved the even stronger inequality $$\max_A |P(Z_n \in A) - P(Y_n \in A)| \le P(Z_n \ne Y_n).$$ The quantity on the left-hand side is the total variation between the distributions of $Z_n$ and $Y_n$.

Related Question