Monte Carlo and Sampling

markov chainsmonte carloprobability theory

Suppose that we have a finite state space $E$ and a distribution $\pi:E \rightarrow (0,1)$ with $\pi(x) >0$. The idea behind Monte Carlo is that we generate a Markov chain $X=(X_n,n\in \mathbb{N})$ with transition matrix $p$ such that $p$ is ergodic (irreducible and aperiodic) and that $\pi$ is the unique invariant distribution of $p$, so that the total variation $||p^n(x,y) -\pi(y)||_{TV} \rightarrow 0$ as $n\rightarrow \infty$.

Question.

However, I'm a little confused about the rigorous justification behind sampling. Suppose that we want to compute the expectation $\mathbb{E}f(Y)$ where $Y$ is a random variables with distribution $\pi$. Then I understand that $\mathbb{E}^x f(X_n) \rightarrow \mathbb{E} f(Y)$ for bounded $f$. But how do we approximate $\mathbb{E}^x f(X_n)$? I would assume that we would like to apply the strong law in "some way" with respect to the Markov chain $X$ so that almost surely, we have
$$
\frac{1}{n} \sum_{k=1}^n f(X_k) \rightarrow \mathbb{E}f(Y), \qquad n\rightarrow \infty
$$

Of course, this is not properly justified, since $X_1,…$ are not pairwise independent and identically distributed so we can't use the strong law.

Best Answer

I will give some general direction on why we can apply the strong law of large numbers. The theorem for stationary stochastic processes that is analogous to the strong law of large numbers for an i. i. d. sequences is called the Birkhoff ergodic theorem.

It has that if $X_1, X_2,...$ is a stationary real-valued stochastic process that is ergodic, and $E(X_i) = \mu$, then $\bar{X} \rightarrow \mu$ almost surely.

This means given a Markov Chain, if for a fixed specification of transition probabilities there is a unique invariant distribution, then the strong law of large numbers holds for any initial distribution that is dominated by the invariant distribution.

We can make this apply to any initial distribution under a slightly stronger regularity condition than uniqueness of the invariant distribution. This is called Harris recurrence and with it the strong law of large numbers holds for any initial distribution.

Related Question