How to Explain Markov Chain Monte Carlo (MCMC) to a Layperson

bayesianintuitionmarkov-chain-montecarloteaching

Maybe the concept, why it's used, and an example.

Best Answer

First, we need to understand what is a Markov chain. Consider the following weather example from Wikipedia. Suppose that weather on any given day can be classified into two states only: sunny and rainy. Based on past experience, we know the following:

$P(\text{Next day is Sunny}\,\vert \,\text{Given today is Rainy)}=0.50$

Since, the next day's weather is either sunny or rainy it follows that:

$P(\text{Next day is Rainy}\,\vert \,\text{Given today is Rainy)}=0.50$

Similarly, let:

$P(\text{Next day is Rainy}\,\vert \,\text{Given today is Sunny)}=0.10$

Therefore, it follows that:

$P(\text{Next day is Sunny}\,\vert \,\text{Given today is Sunny)}=0.90$

The above four numbers can be compactly represented as a transition matrix which represents the probabilities of the weather moving from one state to another state as follows:

$P = \begin{bmatrix} & S & R \\ S& 0.9 & 0.1 \\ R& 0.5 & 0.5 \end{bmatrix}$

We might ask several questions whose answers follow:


Q1: If the weather is sunny today then what is the weather likely to be tomorrow?

A1: Since, we do not know what is going to happen for sure, the best we can say is that there is a $90\%$ chance that it is likely to be sunny and $10\%$ that it will be rainy.


Q2: What about two days from today?

A2: One day prediction: $90\%$ sunny, $10\%$ rainy. Therefore, two days from now:

First day it can be sunny and the next day also it can be sunny. Chances of this happening are: $0.9 \times 0.9$.

Or

First day it can be rainy and second day it can be sunny. Chances of this happening are: $0.1 \times 0.5$.

Therefore, the probability that the weather will be sunny in two days is:

$P(\text{Sunny 2 days from now} = 0.9 \times 0.9 + 0.1 \times 0.5 = 0.81 + 0.05 = 0.86$

Similarly, the probability that it will be rainy is:

$P(\text{Rainy 2 days from now} = 0.1 \times 0.5 + 0.9 \times 0.1 = 0.05 + 0.09 = 0.14$


In linear algebra (transition matrices) these calculations correspond to all the permutations in transitions from one step to the next (sunny-to-sunny ($S_2S$), sunny-to-rainy ($S_2R$), rainy-to-sunny ($R_2S$) or rainy-to-rainy ($R_2R$)) with their calculated probabilities:

enter image description here

On the lower part of the image we see how to calculate the probability of a future state ($t+1$ or $t+2$) given the probabilities (probability mass function, $PMF$) for every state (sunny or rainy) at time zero (now or $t_0$) as simple matrix multiplication.

If you keep forecasting weather like this you will notice that eventually the $n$-th day forecast, where $n$ is very large (say $30$), settles to the following 'equilibrium' probabilities:

$P(\text{Sunny}) = 0.833$

and

$P(\text{Rainy}) = 0.167$

In other words, your forecast for the $n$-th day and the $n+1$-th day remain the same. In addition, you can also check that the 'equilibrium' probabilities do not depend on the weather today. You would get the same forecast for the weather if you start of by assuming that the weather today is sunny or rainy.

The above example will only work if the state transition probabilities satisfy several conditions which I will not discuss here. But, notice the following features of this 'nice' Markov chain (nice = transition probabilities satisfy conditions):

Irrespective of the initial starting state we will eventually reach an equilibrium probability distribution of states.

Markov Chain Monte Carlo exploits the above feature as follows:

We want to generate random draws from a target distribution. We then identify a way to construct a 'nice' Markov chain such that its equilibrium probability distribution is our target distribution.

If we can construct such a chain then we arbitrarily start from some point and iterate the Markov chain many times (like how we forecast the weather $n$ times). Eventually, the draws we generate would appear as if they are coming from our target distribution.

We then approximate the quantities of interest (e.g. mean) by taking the sample average of the draws after discarding a few initial draws which is the Monte Carlo component.

There are several ways to construct 'nice' Markov chains (e.g., Gibbs sampler, Metropolis-Hastings algorithm).