Strictly speaking, MCMC means constructing a Markov chain which has some desired property (usually either a prespecified equilibrium distribution or a prespecified expectation) and then numerically drawing samples from it. If you are given a Markov chain it is really just called Monte Carlo simulation.
Anyway, at each fixed time $t$, the value of the chain will have some given distribution, given by $p_0 P^t$, where $p_0$ is the initial distribution (written as a row vector) and $P$ is the transition probability matrix (in the row-stochastic convention i.e. $\sum_{j=1}^n P_{ij}=1$).
So the long term behavior of the distributions at each fixed $t$ is determined by the behavior of $P^t$ as $t \to \infty$. Any stochastic matrix has all eigenvalues of modulus at most $1$. It also has $1$ as an eigenvalue. The ergodicity assumption additionally tells us that the eigenvalue $1$ has multiplicity $1$ and that all of its other eigenvalues will have modulus strictly less than $1$. (Without irreducibility it is possible that the eigenvalue $1$ has a higher multiplicity; without aperiodicity it is possible to have an eigenvalue like $-1$ or $e^{2\pi i/3}$.) The reversibility assumption tells you (among other things) that $P$ is diagonalizable with real eigenvalues (because reversibility furnishes a similarity transformation which turns $P$ into a symmetric matrix).
As a result of these three facts you can write
$$p_0 P^t = \pi + \sum_{i=2}^n c_i \lambda_i^t q_i$$
where $\lambda_i$ are eigenvalues all less than $1$ in absolute value, $q_i$ are the left eigenvectors, and $c_i$ are coefficients depending on $p_0$. This is just what you get from diagonalizing $P^t$ (without going through and computing the $c_i$ since they do not really matter for this presentation).
You can see that as $t \to \infty$, the remaining eigenvalues will decay because of the power of $t$. Assuming the $c_i$ are all significant (the typical situation for a "random" initial distribution), the slowest decay is by $\lambda_{\max}$, since it is closer to $1$ in absolute value. So the larger the spectral gap $\xi$, the faster the convergence. Note that the negative eigenvalues do matter, which is why you wrote it as $\lambda_{\max}=\max \{ |\lambda_2|,|\lambda_n| \}$.
The precise relationship between the spectral gap and the dynamical properties of the chain is more subtle than the rest of what I've said here. Loosely speaking when the spectral gap is small, the eigenvector corresponding to $\lambda_{\max}$, which I might call a "metastable eigenvector", will be significant and positive for one set of states and significant and negative for another set of states. It might also be smaller on a third set of states.
The "metastable eigenvector" then corresponds to the slow exchange of probability between the first two sets of states (because the decay of its contribution to the distribution results in a flow from one set to the other, with which set is which depending on the sign of the corresponding $c_i$). You can look up metastability in Markov chains for more about this. To get a feel for it, consider a matrix like
$$P=\begin{bmatrix} 1/3 & 2/3 & 0 & 0 \\
1/4-\epsilon/2 & 3/4-\epsilon/2 & \epsilon & 0 \\
0 & \epsilon & 1/5-\epsilon/2 & 4/5-\epsilon/2 \\
0 & 0 & 1/6 & 5/6 \end{bmatrix}$$
where $0<\epsilon \ll 1$ is a small parameter. For $\epsilon=10^{-6}$ for example, this has relatively rapid equilibration within the sets $\{ 1,2 \}$ and $\{ 3,4 \}$ (driven by eigenvalues of about $1/12$ and $1/30$ respectively) and slow equilibration between the two of them (driven by an eigenvalue of about $1-10^{-6}$).
In a finite-state Markov chain with $\pi_ap_{ab} = \pi_bp_{ba}$ for every pair of states $a,b,$ it may be false that $X_n \Rightarrow \pi.$
Specifically, the requirement that $X_n \Rightarrow \pi$ is that (1) every recurrent state is aperiodic and (2) the Markov subchain of recurrent states is irreducible.
Your condition that $\pi_ap_{ab} = \pi_bp_{ba}$ seems to only show that $\pi$ is stationary and that every state is recurrent. That is, in order to have $X_n \Rightarrow \pi$ we'd still need conditions that the Markov chain is both aperiodic and irreducible. Intuitively, that it is irreducible is stating that there is a unique stationary distribution, and that it is aperiodic is that the Markov chain is "well-mixed".
For a couple simple examples, note that the Markov chains on states $1,2$ given by
- $p_{12} = p_{21} = 1$ satisfies your condition for the unique $\pi_1 = \pi_2 = \frac{1}{2},$ but $X_n \not\Rightarrow \pi$ for any $X_0 \neq \pi.$
- $p_{11} = p_{22} = 1$ satisfies your condition for $\pi_1 = t, \pi_2 = 1-t$ for any $0\leq t\leq 1,$ but again $X_n \not\Rightarrow \pi$ for any $X_0 \neq \pi$
Here, the first Markov chain wasn't aperiodic and the second wasn't irreducible.
As a sidenote: In physics, I wouldn't be too surprised if you ignore the "aperiodic" condition, since aperiodic chains are likely ubiquitous in nature while periodic chains are more "exotic." However, reducible Markov chains aren't exotic at all, so they should definitely be considered when answering this question of convergence.
Best Answer
I will give some general direction on why we can apply the strong law of large numbers. The theorem for stationary stochastic processes that is analogous to the strong law of large numbers for an i. i. d. sequences is called the Birkhoff ergodic theorem.
It has that if $X_1, X_2,...$ is a stationary real-valued stochastic process that is ergodic, and $E(X_i) = \mu$, then $\bar{X} \rightarrow \mu$ almost surely.
This means given a Markov Chain, if for a fixed specification of transition probabilities there is a unique invariant distribution, then the strong law of large numbers holds for any initial distribution that is dominated by the invariant distribution.
We can make this apply to any initial distribution under a slightly stronger regularity condition than uniqueness of the invariant distribution. This is called Harris recurrence and with it the strong law of large numbers holds for any initial distribution.