Probability – Computational Methods for Limiting Distribution of Finite Ergodic Markov Chain

markov chainsprobabilitysimulation

We wish to show what can be discovered about the limit of a finite, homogeneous, ergodic Markov Chain $X_1, X_2, \dots,$ using simple methods of computation and simulation.

Specifically, consider the chain with state space $S = \{0, 1, 2\}$ and transition matrix

$P = \begin{bmatrix}p & q & 0 \\0 & p & q \\q & 0 & p \end{bmatrix},$

where $0 < p < 1$ and $q = 1 – p.$

Imagine states arranged clockwise around a circle, with 3 adjacent to 0. At a transition, a particle moves one state clockwise with probability $q$ and stays put with probability $p$.
This is an ergodic chain because $P^2$ has all positive elements. [Alternatively, it would be enough to observe that 'round trip' transitions $0 \rightarrow 1 \rightarrow 2 \rightarrow 0$ are possible (the chain has a single class), and that there positive elements on the principal disgonal (aperiodic).]

Also, in our example, the transition matrix $P$ is doubly stochastic (columns, as well as rows, sum to unity), we know that the limiting distribution of the chain is uniform on $S.$ That is $\lim_{n \rightarrow \infty} P(X_n = k | X_1 = j) = 1/3,$ for $j, k \in S.$ We say that the limiting distribution of this ergodic Markov chain is the probability vector $\lambda = (1/3, 1/3, 1/3)$.

In our initial answer to this question, we use various algebraic and simulation methods to illustrate this limiting process. Additional Answers on related theoretical and computational topics are welcome.

Best Answer

Successive powers of transition matrix. For the transition matrix $P$ of an ergodic $K$ state chain, $\lim_{n \rightarrow \infty}P^n = \Lambda$, where all $K$ rows $\lambda$ of $\Lambda$ are the same, and give the limiting distribution of the chain. That is, $$\lim_{n \rightarrow \infty} p_{ij}^{(n)} = \lim_{n \rightarrow \infty} P(X_{n+1} = j | X_1 = i) = \lambda_j.$$

For our chain with $p=1/2,$ one can use a computer to find that all entries of $P^{16}$ are 0.3333 when rounded to four places, indicating that $\lambda = (1/3, 1/3, 1/3).$ R code for matrix multiplication is %*%. Thus one can get the square of the transition matrix with P %*% P, and then square a few more times to get $P^{16}$ or $P^{32}$ or whatever power shows convincing convergence. [Note: In practice, if the transition probabilities $p_{ij}$ are obtained from data, one must take care that each row of $P$ sums precisely to 1. If there is barely noticeable rounding error in this regard, for the rows of $P,$ it may become catastrophic in $P^{16}.$]

Steady-state distribution of an ergodic chain. An ergodic Markov chain has a unique stationary or steady-state vector $\sigma$ such that $\sigma = \sigma P.$ Also, the stationary and limiting distributions are the same. In terms of linear algebra the row eigenvector of $P$ with the largest modulus is proportional to $\sigma.$ The following R code can be used. [Notes: (i) Because R finds column eigenvectors, it is necessary to take the transpose. (ii) The eigenvector with the largest modulus is listed first in the output, hence [,1]. (iii) The desired eigenvector is real, but perhaps not all are rreal; use as.numeric to avoid awkward notation for complex numbers. (iv) Dividing by the sum makes sure the output vector sums to unity.]

 g = eigen(t(P))$vectors[,1];  as.numeric(g/sum(g))              
 ## 0.3333333 0.3333333 0.3333333

Another method works for simple chains in which there is a unique path for starting with one state and making a round trip of transitions to return to the same state. For our chain starting at state 0, we will wait an average of $1/q$ steps before moving to state 1, another $1/q$ steps to get to state 2, and yet another $1/q$ steps before returning to state 0. Thus the average round trip takes $3/q$ steps, and the chain is in each of the three steps for an average of $1/3$ of the time. Thus intuitively, the long run and stationary distributions are $\lambda = \sigma = (1/3, 1/3, 1/3).$

Simulating the chain. Let $p = 1/2$ and start with $X_1 = 2.$ Then one can simulate the Markov chain through many steps to see for what proportion of the steps the chain is in each of the states. We begin with a simulation through 9999 transitions that emulates the 'story' about clockwise transitions at the beginning of the Question. We see that $X_n$ takes each of the values 0, 1, and 2 very nearly 1/3 of the time.

 m = 9999;  n = 1:m;  x = numeric(m);  x[1] = 2;  p = 1/2;  q = 1 - p
 for (i in 2:m)
   {
    d = rbinom(1, 1, q)       # displacement 0 or 1
    x[i] = (x[i-1] + d) %% 3  # mod 3 arithmetic
   }
 table(x)/m

 ## x
 ##         0         1         2 
 ## 0.3372337 0.3312331 0.3315332

From a simulation, we have more to learn about how quickly the limit is reached and the nature of the Markov dependence from one step to another.

The upper-left panel of the plot below shows how the chain moves among the states over the first 100 transitions. The barplot at upper-right shows that among the 9999 simulated steps, very nearly a third were spent in each state. Running averages of the number of steps in state 0 so far, at of the first 1000 steps can be found in R as s.0 = cumsum(x[1:1000]==0)/(1:1000) When they are plotted as a 'trace' as in the lower-left panel, we see that the running averages convergence to 1/3 (horizontal reference line) as expected. The boundary between the dark and medium-blue regions is the trace for running averages of steps spent in step 0, converging to 1/3 (horizontal red reference line). The boundary between the medium and light-blue regions is the trace for running averages of steps spent in steps 0 or 1.

The plot in the lower-right panel, shows values of the autocorrelation function (ACF) up to lag 40. (See https://en.wikipedia.org/wiki/Autocorrelation.) If the $X_i$ were independent observations (instead of Markovian) all ACF value would hover in the band near 0. The ACF value with lag 0 is the sample correlation of the sequence $X_1, X_2, \dots, X_{9999}$ with itself; is is always 1. With lag 1, one considers the sample correlation of $X_1, X_2, \dots, X_{9998}$ with $X_2, X_3, \dots, X_{9999}$, except that the sample means and standard deviations for all $m = 9999$ observations are used in the standard formula for the sample correlation. Roughly speaking, the ACF plot for this simulation shows that the dependence on the starting step is almost completely lost after 3 or 4 transitions. That is, $P(X_{r+4} = j|X_{r+1} = i) \approx P(X_{r+4} = j).$

The simulation of Markov chains is widely used as a practical method to compute probability distributions. Then the manner and speed of convergence becomes an important consideration. Such methods are called Markov Chain Monte Carlo methods (MCMC). This is not the place for concrete examples of useful chains because these chains (often with continuous state spaces) are much more intricate than the examples discussed here. Roughly speaking, one contrives a Markov chain for which the limiting distribution is the answer to an analytically intractable problem, and then simulates the chain to get information about the elusive distribution.

A version with slow convergence. If the convergence is very slow, a first warning sign may be that the history plot often 'sticks' for a long time before moving (long horizontal segment). Also, the ACF plot may show that correlation does not 'wear away' after only a small number of 'lags'. Finally, we may see very poor convergence to the limiting distribution after even a large number of steps. A variation of this problem with $p = .99$ and $q = .01$ shows plots with bad behavior. This chain also has $\lambda = (1/3, 1/3, /13).$

Except for the change in the value of $p$, the program that produced the graphs below is the same as the program for those above. Unfortunately, it is often difficult to predict in advance which chains converge rapidly and which do not. In practical applications of simulating Markov chains, graphical diagnostics such as those shown here can be very useful.

Best Answer

Related Solutions

Markov Chains – Is Ergodic Markov Chain Irreducible and Aperiodic?

[Math] Invariant Distribution of a Markov Chain

Related Question