Rejection Sampling – Understanding the Proof of Rejection Sampling and Its Implications

monte carlorejection samplingsampling

I am taking a course on Monte Carlo methods and we learned the Rejection Sampling (or Accept-Reject Sampling) method in the last lecture. There are a lot of resources on the web which shows the proof of this method but somehow I am not convinced with them.

So, in the Rejection Sampling, we have a distribution $f(x)$ which is hard to sample from. We choose a easy-to-sample distribution $g(x)$ and find a coefficient $c$ such that $f(x) \leq cg(x)$. Then we sample from $g(x)$ and for each draw, $x_i$, we also sample a $u$ from a standard uniform distribution $U(u|0,1)$.

The sample $x_i$ is accepted if it is $cg(x_i)u \leq f(x_i)$ and rejected otherwise.

The proofs which I came across usually just show that $p(x|Accept) = f(x)$ and stop there.

What I think about this process is that we have a sequence of variables $x_1,Accept_1,x_2,Accept_2,…,x_n,Accept_n$ and a $x_i,Accept_i$ pair corresponds to our i.th sample ($x_i$) and whether it is accepted ($Accept_i$). We know that each $x_i,Accept_i$ pair is independent of each other, such that:

$P(x_1,Accept_1,x_2,Accept_2,…,x_n,Accept_n) = \prod\limits_{i=1}^n P(x_i,Accept_i)$

For a $(x_i,Accept_i)$ pair we know that $P(x_i) = g(x_i)$ and $P(Accept_i|x_i) = \frac{f(x_i)}{cg(x_i)}$. We can readily calculate $p(x_i|Accept_i)$ but I don't understand how it suffices as a proof. We need to show that the algorithm works, so I think a proof should show that the empricial distribution of the accepted samples converge to $f(x)$ as $n\rightarrow\infty$. I mean, with $n$ being the number of all accepted and rejected samples:

$\frac{Number \hspace{1mm} of \hspace{1mm} samples \hspace{1mm} with \hspace{1mm} (A \leq x_i \leq B)}{Number \hspace{1mm} of \hspace{1mm} accepted \hspace{1mm} samples} \rightarrow \int_A^B f(x)dx$ as $n\rightarrow\infty$.

Am I wrong with this thought pattern? Or is there a connection between the common proof of the algorithm and this?

Thanks in advance

Best Answer

You should think of the algorithm as producing draws from a random variable, to show that the algorithm works, it suffices to show that the algorithm draws from the random variable you want it to.

Let $X$ and $Y$ be scalar random variables with pdfs $f_X$ and $f_Y$ respectively, where $Y$ is something we already know how to sample from. We can also know that we can bound $f_X$ by $Mf_Y$ where $M\ge1$.

We now form a new random variable $A$ where $A | y \sim \text{Bernoulli } \left (\frac{f_X(y)}{Mf_Y(y)}\right )$, this takes the value $1$ with probability $\frac{f_X(y)}{Mf_Y(y)} $ and $0$ otherwise. This represents the algorithm 'accepting' a draw from $Y$.

Now we run the algorithm and collect all the draws from $Y$ that are accepted, lets call this random variable $Z = Y|A=1$.

To show that $Z \equiv X$, for any event $E$, we must show that $P(Z \in E) =P(X \in E)$.

So let's try that, first use Bayes' rule:

$P(Z \in E) = P(Y \in E | A =1) = \frac{P(Y \in E \& A=1)}{P(A=1)}$,

and the top part we write as

\begin{align*}P(Y \in E \& A=1) &= \int_E f_{Y, A}(y,1) \, dy \\ &= \int_E f_{A|Y}(1,y)f_Y(y) \, dy =\int_E f_Y(y) \frac{f_X(y)}{Mf_Y(y)} \, dy =\frac{P(X \in E)}{M}.\end{align*}

And then the bottom part is simply

$P(A=1) = \int_{-\infty}^{\infty}f_{Y,A}(y,1) \, dy = \frac{1}{M}$,

by the same reasoning as above, setting $E=(-\infty, +\infty)$.

And these combine to give $P(X \in E)$, which is what we wanted, $Z \equiv X$.

That is how the algorithm works, but at the end of your question you seem to be concerned about a more general idea, that is when does an empirical distribution converge to the distribution sampled from? This is a general phenomenon concerning any sampling whatsoever if I understand you correctly.

In this case, let $X_1, \dots, X_n$ be iid random variables all with distribution $\equiv X$. Then for any event $E$, $\frac{\sum_{i=1}^n1_{X_i \in E}}{n}$ has expectation $P(X \in E)$ by the linearity of expectation.

Furthermore, given suitable assumptions you could use the strong law of large numbers to show that the empirical probability converges almost surely to the true probability.