Optimal Height for Acceptance-Rejection Sampling
You want to sample from a random variable $X$ with bounded PDF $f(x)$ supported on $(0, 1)$. The acceptance-rejection method works as follows: sample a point $(X, Y)$ uniformly from the rectangle $R_h = (0,1) \times (0, h)$ for some $h > 0$. Accept the sample if $(X, Y)$ falls under the graph of $f(x)$ (i.e., $Y \le f(X)$); otherwise reject and resample.
It can be shown that accepted $X$ values have marginal PDF $f(x)$.
- What is the distribution of the number of draws needed to get one valid sample, as a function of $h$?
- What value of $h$ minimizes the expected number of draws?
- Compute this optimal $h$ for $X \sim \text{Beta}(4, 2)$.
Hints
- The acceptance probability is the ratio of the area under $f(x)$ (which is 1 for any PDF) to the area of the bounding rectangle $h$.
- The number of draws until first acceptance is geometric with parameter /h$, so $E[N] = h$. To minimize draws, minimize $h$ -- but $h$ must cover the entire PDF.
- The minimum valid $h$ is $\sup f(x)$. For Beta(4,2), find the mode at $x = (\alpha - 1)/(\alpha + \beta - 2) = 3/4$ and evaluate $f(3/4)$.
Worked Solution
How to Think About It: You are throwing darts uniformly at a rectangle and keeping only the ones that land under the curve. The acceptance probability is the ratio of the area under $f(x)$ to the area of the rectangle. Since $f$ is a PDF on $(0,1)$, the area under it is exactly 1. The rectangle has area $h \cdot 1 = h$. So the acceptance probability is
/h$, and you want to make $h$ as small as possible to waste fewer darts.Quick Estimate: The Beta(4,2) PDF peaks somewhere around $x = 3/4$ (since the mode of Beta($\alpha, \beta$) is $(\alpha - 1)/(\alpha + \beta - 2) = 3/4$). The PDF value at the mode should be a bit above 2. So we need roughly $h \approx 2.1$, and the expected number of draws should be around 2.1.
Approach: Find the acceptance probability as a function of $h$, determine the optimal $h$, then compute the mode of Beta(4,2).
Formal Solution:
Step 1: Acceptance probability and number of draws.
Each point $(X, Y)$ is uniform on $R_h$. The probability of acceptance is: $P(\text{accept}) = \frac{\text{Area under } f(x)}{\text{Area of } R_h} = \frac{1}{h}$
The number of draws $N$ until the first acceptance is geometric with success probability
/h$: $N \sim \text{Geom}(1/h), \quad E[N] = h$Step 2: Optimal $h$.
We want to minimize $E[N] = h$. But $h$ must be large enough that the rectangle covers the entire PDF -- otherwise some parts of $f(x)$ would be above the rectangle and could never be sampled. This requires: $h \ge \sup_{x \in (0,1)} f(x)$
The minimum valid $h$ is: $h^{*} = \sup_{x \in (0,1)} f(x)$
Step 3: Compute for Beta(4,2).
The PDF of $\text{Beta}(4, 2)$ is: $f(x) = \frac{x^3(1-x)}{B(4,2)} = \frac{x^3(1-x)}{1/20} = 20x^3(1-x)$
Find the maximum by taking the derivative: $f'(x) = 20(3x^2(1-x) - x^3) = 20x^2(3 - 4x)$
Setting $f'(x) = 0$: the critical points are $x = 0$ and $x = 3/4$. Since $f(0) = 0$, the maximum is at $x = 3/4$: $f(3/4) = 20 \cdot \left(\frac{3}{4}\right)^3 \cdot \left(\frac{1}{4}\right) = 20 \cdot \frac{27}{64} \cdot \frac{1}{4} = \frac{540}{256} = \frac{135}{64}$
Answer: The optimal height is $h^{*} = \dfrac{135}{64} \approx 2.109$, which gives an expected number of draws of $E[N] = \dfrac{135}{64}$ per valid sample.
Intuition
Acceptance-rejection sampling is one of the workhorses of computational statistics. The key insight is beautifully simple: the expected number of samples you waste equals the area of the bounding region. Since the PDF integrates to 1, the acceptance rate is
/h$, and you need the rectangle to be just tall enough to cover the highest point of the PDF. This is why acceptance-rejection works best for distributions that are not too "spiky" -- a uniform distribution needs $h = 1$ (perfect efficiency), while a distribution with a tall narrow peak wastes most of your samples. In practice, choosing a better proposal distribution (not just a rectangle) can dramatically improve efficiency, which leads to the general version of the method where you bound $f$ with $M \cdot g(x)$ for some easy-to-sample $g$.