Optimal Height for Acceptance-Rejection Sampling

Expectation · Medium · Free problem

You want to sample from a random variable $X$ with bounded PDF $f(x)$ supported on $(0, 1)$. The acceptance-rejection method works as follows: sample a point $(X, Y)$ uniformly from the rectangle $R_h = (0,1) \times (0, h)$ for some $h > 0$. Accept the sample if $(X, Y)$ falls under the graph of $f(x)$ (i.e., $Y \le f(X)$); otherwise reject and resample.

It can be shown that accepted $X$ values have marginal PDF $f(x)$.

  1. What is the distribution of the number of draws needed to get one valid sample, as a function of $h$?
  2. What value of $h$ minimizes the expected number of draws?
  3. Compute this optimal $h$ for $X \sim \text{Beta}(4, 2)$.

Hints

  1. The acceptance probability is the ratio of the area under $f(x)$ (which is 1 for any PDF) to the area of the bounding rectangle $h$.
  2. The number of draws until first acceptance is geometric with parameter
    /h$, so $E[N] = h$. To minimize draws, minimize $h$ -- but $h$ must cover the entire PDF.
  3. The minimum valid $h$ is $\sup f(x)$. For Beta(4,2), find the mode at $x = (\alpha - 1)/(\alpha + \beta - 2) = 3/4$ and evaluate $f(3/4)$.

Worked Solution

How to Think About It: You are throwing darts uniformly at a rectangle and keeping only the ones that land under the curve. The acceptance probability is the ratio of the area under $f(x)$ to the area of the rectangle. Since $f$ is a PDF on $(0,1)$, the area under it is exactly 1. The rectangle has area $h \cdot 1 = h$. So the acceptance probability is

/h$, and you want to make $h$ as small as possible to waste fewer darts.

Quick Estimate: The Beta(4,2) PDF peaks somewhere around $x = 3/4$ (since the mode of Beta($\alpha, \beta$) is $(\alpha - 1)/(\alpha + \beta - 2) = 3/4$). The PDF value at the mode should be a bit above 2. So we need roughly $h \approx 2.1$, and the expected number of draws should be around 2.1.

Approach: Find the acceptance probability as a function of $h$, determine the optimal $h$, then compute the mode of Beta(4,2).

Formal Solution:

Step 1: Acceptance probability and number of draws.

Each point $(X, Y)$ is uniform on $R_h$. The probability of acceptance is: $P(\text{accept}) = \frac{\text{Area under } f(x)}{\text{Area of } R_h} = \frac{1}{h}$

The number of draws $N$ until the first acceptance is geometric with success probability

/h$: $N \sim \text{Geom}(1/h), \quad E[N] = h$

Step 2: Optimal $h$.

We want to minimize $E[N] = h$. But $h$ must be large enough that the rectangle covers the entire PDF -- otherwise some parts of $f(x)$ would be above the rectangle and could never be sampled. This requires: $h \ge \sup_{x \in (0,1)} f(x)$

The minimum valid $h$ is: $h^{*} = \sup_{x \in (0,1)} f(x)$

Step 3: Compute for Beta(4,2).

The PDF of $\text{Beta}(4, 2)$ is: $f(x) = \frac{x^3(1-x)}{B(4,2)} = \frac{x^3(1-x)}{1/20} = 20x^3(1-x)$

Find the maximum by taking the derivative: $f'(x) = 20(3x^2(1-x) - x^3) = 20x^2(3 - 4x)$

Setting $f'(x) = 0$: the critical points are $x = 0$ and $x = 3/4$. Since $f(0) = 0$, the maximum is at $x = 3/4$: $f(3/4) = 20 \cdot \left(\frac{3}{4}\right)^3 \cdot \left(\frac{1}{4}\right) = 20 \cdot \frac{27}{64} \cdot \frac{1}{4} = \frac{540}{256} = \frac{135}{64}$

Answer: The optimal height is $h^{*} = \dfrac{135}{64} \approx 2.109$, which gives an expected number of draws of $E[N] = \dfrac{135}{64}$ per valid sample.

Intuition

Acceptance-rejection sampling is one of the workhorses of computational statistics. The key insight is beautifully simple: the expected number of samples you waste equals the area of the bounding region. Since the PDF integrates to 1, the acceptance rate is

/h$, and you need the rectangle to be just tall enough to cover the highest point of the PDF. This is why acceptance-rejection works best for distributions that are not too "spiky" -- a uniform distribution needs $h = 1$ (perfect efficiency), while a distribution with a tall narrow peak wastes most of your samples. In practice, choosing a better proposal distribution (not just a rectangle) can dramatically improve efficiency, which leads to the general version of the method where you bound $f$ with $M \cdot g(x)$ for some easy-to-sample $g$.

Open the full interactive solver →