Solved – Accept-reject algorithm for Beta(1,$\beta$)

monte carlorejection samplingself-study

Consider the pdf

$$f(x)= \begin{cases} \beta x^{\beta -1 }\quad 0<x<1 \\ 0\quad \text{elsewhere} \end{cases} $$

for $\beta >1 $

Use the accept-reject algorithm to generate an observation from this pdf.

I have posited that this beta distribution is dominated by a uniform one and thus:

$\beta x^{\beta-1} \leq 1$

The theorem states that if $U\leq \frac{f(x)}{Mg(x)}$, where $U\sim unif(0,1)$ is our simulated uniform RV, we can accept this observation.

In the above $M=1$ and $g(x)=1$. Can I go ahead and set up my inequality like that? Thank you.

Best Answer

I will try to explain how rejection sampling works "in general". If I am clear enough, you should then be able to answer your own question. I won't give proofs but intuitive (or I hope so) "facts".

Fact 1 If $X$ is drawn in a distribution of density $g(x)$ and $U$ in a uniform $U(0,1)$, then the point $(X, U\times M\times g(X))$ is drawn uniformly in the subgraph of $M g(x)$. Illustration with the normal density and $M = 5$:

> x <- rnorm(1e4)
> u <- runif(1e4)
> y <- u*5*dnorm(x)
> plot(x, y, pch=".")

plot produced by above R commands

This fact should be intuitive from the definition of the density: the height $g(x)$ of the subgraph above $x$ is the "probability" to fall in $x$.

Fact 2 If you can draw uniformly points $(X,Y)$ in the subgraph of $f(x)$ (a density function), then $X$ is distributed with density $f(x)$.

Fact 3 If $f(x) \le M g(x)$ and you draw points uniformly in the subgraph of $M g(x)$, the points which fall in the subgraph of $f(x)$ are uniformly distributed in it.

Illustration with the tent function $f(x) = \begin{cases} 0 & \text{if } |x| \ge 1 \\ x+1 & \text{if } -1 < x \le 0 \\ 1-x & \text{if } 0 < x < 1 \end{cases}$

 > f <- function(x) ifelse(-1<x & x<=0, x+1, ifelse(0<x & x<1, 1-x, 0))
 > plot(x, y, pch="x", col=ifelse(y < f(x), "black", "red") )

cf R code

All these facts together implies that if you keep only the points $X$ such that $U \times M \times G(X) \le f(X)$, these points are distributed with density $f$.

> x1 <- x[ y < f(x) ]
> length(x1)
[1] 2068
> hist(x1)

histogram

Of course this example is only intended for illustration, the proposed algorithm would be a bad way to sample in the tent distribution.

Note that one nice feature of importance sampling, that you can deduce from the above argumentation, is that if you know $f(x)$ only up to some unknown multiplicative constant, you can use the algorithm as well.