Beta-Binomial Sample Size for Credible Interval Width

Statistics · Medium · Free problem

You are estimating an unknown conversion probability $p$ with a $\text{Beta}(a, b)$ prior. After observing $H$ heads and $T$ tails, the posterior is $\text{Beta}(a + H, b + T)$. You keep flipping until the 95% equal-tailed credible interval for $p$ has half-width at most $\varepsilon$.

  1. Express the stopping rule in terms of the observed counts $H$ and $T$ (and the prior parameters $a$, $b$).
  1. For $a = b = 1$ (uniform prior) and $\varepsilon = 0.01$, give an accurate approximation for the required sample size $n = H + T$.
  1. Relate your Bayesian stopping rule to the classical fixed-width confidence interval for a Bernoulli mean using normal approximations. When do the two approaches agree?

Hints

  1. For large sample sizes, the $\text{Beta}(\alpha, \beta)$ distribution is well-approximated by a Gaussian with variance $\hat{p}(1-\hat{p})/(\alpha + \beta + 1)$.
  2. The worst case for sample size is $p = 0.5$, since that maximizes the posterior variance $p(1-p)$. Plug in $\hat{p} = 0.5$ to get the upper bound on $n$.
  3. For part (iii), compare the Bayesian posterior variance $\hat{p}(1-\hat{p})/(n + a + b + 1)$ with the frequentist sampling variance $\hat{p}(1-\hat{p})/n$. The difference is the prior's effective sample size.

Worked Solution

How to Think About It: You want to keep collecting data until you have nailed down $p$ to within $\pm 0.01$. The posterior $\text{Beta}(a+H, b+T)$ concentrates as you get more data, and the credible interval width shrinks roughly like

/\sqrt{n}$. The question is: exactly when is it narrow enough? The practical challenge is that the posterior variance depends on the observed $H/n$, so the sample size you need depends on where $p$ actually is -- worst case is $p = 0.5$.

Quick Estimate: For large $n$ with a uniform prior, the posterior is approximately $N(\hat{p}, \hat{p}(1-\hat{p})/(n+2))$ where $\hat{p} = (H+1)/(n+2)$. The 95% credible interval half-width is

.96\sqrt{\hat{p}(1-\hat{p})/(n+2)}$. Setting this to $0.01$ and using worst case $\hat{p} = 0.5$: $n + 2 \approx 1.96^2 \times 0.25 / 0.01^2 = 9604$, so $n \approx 9602$. In practice with the actual Beta quantiles, you get something close to $n \approx 9604$.

Formal Solution:

(i) Stopping rule:

The 95% equal-tailed credible interval is $[q_{0.025}, q_{0.975}]$ where $q_\alpha$ is the $\alpha$-quantile of $\text{Beta}(a+H, b+T)$. The stopping rule is:

$\frac{q_{0.975} - q_{0.025}}{2} \le \varepsilon$

For large $n = H + T$, the Beta posterior is well-approximated by a Gaussian. Let $\alpha' = a + H$, $\beta' = b + T$, and $n' = \alpha' + \beta'$. The posterior mean is $\hat{p} = \alpha'/n'$ and variance is $\hat{p}(1-\hat{p})/(n'+1)$. The normal approximation gives the stopping rule:

$1.96 \sqrt{\frac{\hat{p}(1-\hat{p})}{n'+1}} \le \varepsilon$

Squaring and rearranging:

$n' \ge \frac{1.96^2 \hat{p}(1-\hat{p})}{\varepsilon^2} - 1$

Since $\hat{p}$ depends on the data, the stopping criterion must be checked after each observation.

(ii) Sample size for $a = b = 1$, $\varepsilon = 0.01$:

With $a = b = 1$, we have $n' = n + 2$. The worst case is $\hat{p} = 0.5$, giving:

$n + 2 \ge \frac{1.96^2 \times 0.25}{0.0001} = \frac{0.9604}{0.0001} = 9604$

$n \ge 9602$

For other values of $\hat{p}$, less data is needed. If $\hat{p} = 0.1$, you need $n + 2 \ge 1.96^2 \times 0.09 / 0.0001 \approx 3458$, so $n \approx 3456$.

A good approximation valid for all $\hat{p}$:

$n \approx \frac{1.96^2 \hat{p}(1-\hat{p})}{\varepsilon^2} - (a + b)$

The maximum sample size (worst case) is approximately $n \approx 9604$.

(iii) Connection to frequentist fixed-width CI:

The classical approach uses the CLT: the MLE $\hat{p} = H/n$ has approximate distribution $N(p, p(1-p)/n)$. A 95% confidence interval has half-width

.96\sqrt{\hat{p}(1-\hat{p})/n}$. Setting this to $\varepsilon$:

$n \ge \frac{1.96^2 \hat{p}(1-\hat{p})}{\varepsilon^2}$

Compare with the Bayesian version: $n + a + b \ge 1.96^2 \hat{p}(1-\hat{p})/\varepsilon^2 + 1$. The two agree when $a + b$ is negligible relative to $n$ -- i.e., when the prior is "washed out" by data. For $a = b = 1$ and $n \approx 9600$, the difference is tiny (

/9600 < 0.1\%$).

The approaches diverge when the prior is strong (large $a + b$) or the sample size is small. A strong prior shrinks the credible interval, so the Bayesian approach can stop earlier. The frequentist approach ignores prior information entirely.

Answer: The stopping rule is

.96\sqrt{\hat{p}(1-\hat{p})/(n+a+b+1)} \le \varepsilon$ under the normal approximation. For a uniform prior with $\varepsilon = 0.01$, the worst-case sample size is about $n = 9604$. The Bayesian and frequentist approaches agree asymptotically, differing only by the effective prior sample size $a + b$.

Intuition

This problem illustrates how Bayesian and frequentist interval estimation converge when data overwhelms the prior. The 95% credible interval width shrinks like

/\sqrt{n}$, just like a confidence interval, because the Beta posterior becomes increasingly Gaussian. The prior parameters $a$ and $b$ act like "phantom observations" -- they contribute an effective sample size of $a + b$ to the total, which matters when $n$ is small but becomes irrelevant for large $n$.

In practice, this kind of sequential stopping rule comes up in A/B testing for conversion rates. You want to stop the experiment as soon as you have enough precision, but the required sample size depends on the true conversion rate (which you do not know in advance). The worst case at $p = 0.5$ gives you a safe upper bound, but if the true rate is far from 0.5, you can stop much earlier. This is why adaptive sequential designs are valuable -- they let the data tell you when you have enough.

Open the full interactive solver →