Solved – Are MCMC without memory

markov-chain-montecarlo

I'm trying to understand what Markov chain Monte Carlo (MCMC) are from the French Wikipedia page. They say "that the Markov chain Monte Carlo methods consist of generating a vector $x_ {i}$ only from the vector data $x_ {i-1}$ it is therefore a process "without memory""

Les méthodes de Monte-Carlo par chaînes de Markov consistent à générer
un vecteur $x_{i}$ uniquement à partir de la
donnée du vecteur $x_{{i-1}}$ ; c'est donc un
processus « sans mémoire »,

I don't understand why they say MCMC are "without memory" as far as we use information from the vector data $x_ {i-1}$ to generate $x_i$.

Best Answer

The defining characteristic of a Markov chain is that the conditional distribution of its present value conditional on past values depends only on the previous value. So every Markov chain is "without memory" to the extent that only the previous value affects the present conditional probability, and all previous states are "forgotten". (You are right that it is not completely without memory - after all, the conditional distribution of the present value depends on the previous value.) That is true for MCMC and also for any other Markov chain.

Related Solutions

Markov Chain Monte Carlo – Can the Proposal Distribution in Random-Walk MH MCMC Be Changed Without Affecting Markovianity?

I think that this paper from Heikki Haario et al. will give you the answer you need. The markovianity of the chain is affected by the adaptation of the proposal density, because then a new proposed value depends not only of the previous one but on the whole chain. But it seems that the sequence has still the good properties if great care is taken.

Solved – we care about rapid mixing in MCMC chains

The ideal Monte Carlo algorithm uses independent successive random values. In MCMC, successive values are not independant, which makes the method converge slower than ideal Monte Carlo; however, the faster it mixes, the faster the dependence decays in successive iterations¹, and the faster it converges.

¹ I mean here that the successive values are quickly "almost independent" of the initial state, or rather that given the value $X_n$ at one point, the values $X_{ń+k}$ become quickly "almost independent" of $X_n$ as $k$ grows; so, as qkhhly says in the comments, "the chain don’t keep stuck in a certain region of the state space".

Edit: I think the following example can help

Imagine you want to estimate the mean of the uniform distribution on $\{1, \dots, n\}$ by MCMC. You start with the ordered sequence $(1, \dots, n)$; at each step, you chose $k>2$ elements in the sequence and randomly shuffle them. At each step, the element at position 1 is recorded; this converges to the uniform distribution. The value of $k$ controls the mixing rapidity: when $k=2$, it is slow; when $k=n$, the successive elements are independent and the mixing is fast.

Here is a R function for this MCMC algorithm :

mcmc <- function(n, k = 2, N = 5000)
{
  x <- 1:n;
  res <- numeric(N)
  for(i in 1:N)
  {
    swap <- sample(1:n, k)
    x[swap] <- sample(x[swap],k);
    res[i] <- x[1];
  }
  return(res);
}

Let’s apply it for $n = 99$, and plot the successive estimation of the mean $\mu = 50$ along the MCMC iterations:

n <- 99; mu <- sum(1:n)/n;

mcmc(n) -> r1
plot(cumsum(r1)/1:length(r1), type="l", ylim=c(0,n), ylab="mean")
abline(mu,0,lty=2)

mcmc(n,round(n/2)) -> r2
lines(1:length(r2), cumsum(r2)/1:length(r2), col="blue")

mcmc(n,n) -> r3
lines(1:length(r3), cumsum(r3)/1:length(r3), col="red")

legend("topleft", c("k = 2", paste("k =",round(n/2)), paste("k =",n)), col=c("black","blue","red"), lwd=1)

mcmc convergence

You can see here that for $k=2$ (in black), the convergence is slow; for $k=50$ (in blue), it is faster, but still slower than with $k=99$ (in red).

You can also plot an histogram for the distribution of the estimated mean after a fixed number of iterations, eg 100 iterations:

K <- 5000;
M1 <- numeric(K)
M2 <- numeric(K)
M3 <- numeric(K)
for(i in 1:K)
{
  M1[i] <- mean(mcmc(n,2,100));
  M2[i] <- mean(mcmc(n,round(n/2),100));
  M3[i] <- mean(mcmc(n,n,100));
}

dev.new()
par(mfrow=c(3,1))
hist(M1, xlim=c(0,n), freq=FALSE)
hist(M2, xlim=c(0,n), freq=FALSE)
hist(M3, xlim=c(0,n), freq=FALSE)

histograms

You can see that with $k=2$ (M1), the influence of the initial value after 100 iterations only gives you a terrible result. With $k=50$ it seems ok, with still greater standard deviation than with $k=99$. Here are the means and sd:

> mean(M1)
[1] 19.046
> mean(M2)
[1] 49.51611
> mean(M3)
[1] 50.09301
> sd(M2)
[1] 5.013053
> sd(M3)
[1] 2.829185

Best Answer

Related Solutions

Markov Chain Monte Carlo – Can the Proposal Distribution in Random-Walk MH MCMC Be Changed Without Affecting Markovianity?

Solved – we care about rapid mixing in MCMC chains

Related Question