Rejection Sampling – Understanding the Proof of Rejection Sampling and Its Implications

monte carlorejection samplingsampling

I am taking a course on Monte Carlo methods and we learned the Rejection Sampling (or Accept-Reject Sampling) method in the last lecture. There are a lot of resources on the web which shows the proof of this method but somehow I am not convinced with them.

So, in the Rejection Sampling, we have a distribution $f(x)$ which is hard to sample from. We choose a easy-to-sample distribution $g(x)$ and find a coefficient $c$ such that $f(x) \leq cg(x)$. Then we sample from $g(x)$ and for each draw, $x_i$, we also sample a $u$ from a standard uniform distribution $U(u|0,1)$.

The sample $x_i$ is accepted if it is $cg(x_i)u \leq f(x_i)$ and rejected otherwise.

The proofs which I came across usually just show that $p(x|Accept) = f(x)$ and stop there.

What I think about this process is that we have a sequence of variables $x_1,Accept_1,x_2,Accept_2,…,x_n,Accept_n$ and a $x_i,Accept_i$ pair corresponds to our i.th sample ($x_i$) and whether it is accepted ($Accept_i$). We know that each $x_i,Accept_i$ pair is independent of each other, such that:

$P(x_1,Accept_1,x_2,Accept_2,…,x_n,Accept_n) = \prod\limits_{i=1}^n P(x_i,Accept_i)$

For a $(x_i,Accept_i)$ pair we know that $P(x_i) = g(x_i)$ and $P(Accept_i|x_i) = \frac{f(x_i)}{cg(x_i)}$. We can readily calculate $p(x_i|Accept_i)$ but I don't understand how it suffices as a proof. We need to show that the algorithm works, so I think a proof should show that the empricial distribution of the accepted samples converge to $f(x)$ as $n\rightarrow\infty$. I mean, with $n$ being the number of all accepted and rejected samples:

$\frac{Number \hspace{1mm} of \hspace{1mm} samples \hspace{1mm} with \hspace{1mm} (A \leq x_i \leq B)}{Number \hspace{1mm} of \hspace{1mm} accepted \hspace{1mm} samples} \rightarrow \int_A^B f(x)dx$ as $n\rightarrow\infty$.

Am I wrong with this thought pattern? Or is there a connection between the common proof of the algorithm and this?

Thanks in advance

Best Answer

You should think of the algorithm as producing draws from a random variable, to show that the algorithm works, it suffices to show that the algorithm draws from the random variable you want it to.

Let $X$ and $Y$ be scalar random variables with pdfs $f_X$ and $f_Y$ respectively, where $Y$ is something we already know how to sample from. We can also know that we can bound $f_X$ by $Mf_Y$ where $M\ge1$.

We now form a new random variable $A$ where $A | y \sim \text{Bernoulli } \left (\frac{f_X(y)}{Mf_Y(y)}\right )$, this takes the value $1$ with probability $\frac{f_X(y)}{Mf_Y(y)} $ and $0$ otherwise. This represents the algorithm 'accepting' a draw from $Y$.

Now we run the algorithm and collect all the draws from $Y$ that are accepted, lets call this random variable $Z = Y|A=1$.

To show that $Z \equiv X$, for any event $E$, we must show that $P(Z \in E) =P(X \in E)$.

So let's try that, first use Bayes' rule:

$P(Z \in E) = P(Y \in E | A =1) = \frac{P(Y \in E \& A=1)}{P(A=1)}$,

and the top part we write as

\begin{align*}P(Y \in E \& A=1) &= \int_E f_{Y, A}(y,1) \, dy \\ &= \int_E f_{A|Y}(1,y)f_Y(y) \, dy =\int_E f_Y(y) \frac{f_X(y)}{Mf_Y(y)} \, dy =\frac{P(X \in E)}{M}.\end{align*}

And then the bottom part is simply

$P(A=1) = \int_{-\infty}^{\infty}f_{Y,A}(y,1) \, dy = \frac{1}{M}$,

by the same reasoning as above, setting $E=(-\infty, +\infty)$.

And these combine to give $P(X \in E)$, which is what we wanted, $Z \equiv X$.

That is how the algorithm works, but at the end of your question you seem to be concerned about a more general idea, that is when does an empirical distribution converge to the distribution sampled from? This is a general phenomenon concerning any sampling whatsoever if I understand you correctly.

In this case, let $X_1, \dots, X_n$ be iid random variables all with distribution $\equiv X$. Then for any event $E$, $\frac{\sum_{i=1}^n1_{X_i \in E}}{n}$ has expectation $P(X \in E)$ by the linearity of expectation.

Furthermore, given suitable assumptions you could use the strong law of large numbers to show that the empirical probability converges almost surely to the true probability.

Best Answer

Related Solutions

Solved – How does Gibbs sampling produce values for a variable using the univariate conditional probability

Solved – Proof of Rejection Sampling

Related Question