Beta-Binomial Sample Size for Credible Interval Width
You are estimating an unknown conversion probability $p$ with a $\text{Beta}(a, b)$ prior. After observing $H$ heads and $T$ tails, the posterior is $\text{Beta}(a + H, b + T)$. You keep flipping until the 95% equal-tailed credible interval for $p$ has half-width at most $\varepsilon$.
- Express the stopping rule in terms of the observed counts $H$ and $T$ (and the prior parameters $a$, $b$).
- For $a = b = 1$ (uniform prior) and $\varepsilon = 0.01$, give an accurate approximation for the required sample size $n = H + T$.
- Relate your Bayesian stopping rule to the classical fixed-width confidence interval for a Bernoulli mean using normal approximations. When do the two approaches agree?
Hints
- For large sample sizes, the $\text{Beta}(\alpha, \beta)$ distribution is well-approximated by a Gaussian with variance $\hat{p}(1-\hat{p})/(\alpha + \beta + 1)$.
- The worst case for sample size is $p = 0.5$, since that maximizes the posterior variance $p(1-p)$. Plug in $\hat{p} = 0.5$ to get the upper bound on $n$.
- For part (iii), compare the Bayesian posterior variance $\hat{p}(1-\hat{p})/(n + a + b + 1)$ with the frequentist sampling variance $\hat{p}(1-\hat{p})/n$. The difference is the prior's effective sample size.
Worked Solution
How to Think About It: You want to keep collecting data until you have nailed down $p$ to within $\pm 0.01$. The posterior $\text{Beta}(a+H, b+T)$ concentrates as you get more data, and the credible interval width shrinks roughly like
Quick Estimate: For large $n$ with a uniform prior, the posterior is approximately $N(\hat{p}, \hat{p}(1-\hat{p})/(n+2))$ where $\hat{p} = (H+1)/(n+2)$. The 95% credible interval half-width is
Formal Solution:
(i) Stopping rule:
The 95% equal-tailed credible interval is $[q_{0.025}, q_{0.975}]$ where $q_\alpha$ is the $\alpha$-quantile of $\text{Beta}(a+H, b+T)$. The stopping rule is:
$\frac{q_{0.975} - q_{0.025}}{2} \le \varepsilon$
For large $n = H + T$, the Beta posterior is well-approximated by a Gaussian. Let $\alpha' = a + H$, $\beta' = b + T$, and $n' = \alpha' + \beta'$. The posterior mean is $\hat{p} = \alpha'/n'$ and variance is $\hat{p}(1-\hat{p})/(n'+1)$. The normal approximation gives the stopping rule:
$1.96 \sqrt{\frac{\hat{p}(1-\hat{p})}{n'+1}} \le \varepsilon$
Squaring and rearranging:
$n' \ge \frac{1.96^2 \hat{p}(1-\hat{p})}{\varepsilon^2} - 1$
Since $\hat{p}$ depends on the data, the stopping criterion must be checked after each observation.
(ii) Sample size for $a = b = 1$, $\varepsilon = 0.01$:
With $a = b = 1$, we have $n' = n + 2$. The worst case is $\hat{p} = 0.5$, giving:
$n + 2 \ge \frac{1.96^2 \times 0.25}{0.0001} = \frac{0.9604}{0.0001} = 9604$
$n \ge 9602$
For other values of $\hat{p}$, less data is needed. If $\hat{p} = 0.1$, you need $n + 2 \ge 1.96^2 \times 0.09 / 0.0001 \approx 3458$, so $n \approx 3456$.
A good approximation valid for all $\hat{p}$:
$n \approx \frac{1.96^2 \hat{p}(1-\hat{p})}{\varepsilon^2} - (a + b)$
The maximum sample size (worst case) is approximately $n \approx 9604$.
(iii) Connection to frequentist fixed-width CI:
The classical approach uses the CLT: the MLE $\hat{p} = H/n$ has approximate distribution $N(p, p(1-p)/n)$. A 95% confidence interval has half-width
$n \ge \frac{1.96^2 \hat{p}(1-\hat{p})}{\varepsilon^2}$
Compare with the Bayesian version: $n + a + b \ge 1.96^2 \hat{p}(1-\hat{p})/\varepsilon^2 + 1$. The two agree when $a + b$ is negligible relative to $n$ -- i.e., when the prior is "washed out" by data. For $a = b = 1$ and $n \approx 9600$, the difference is tiny (
The approaches diverge when the prior is strong (large $a + b$) or the sample size is small. A strong prior shrinks the credible interval, so the Bayesian approach can stop earlier. The frequentist approach ignores prior information entirely.
Answer: The stopping rule is
Intuition
This problem illustrates how Bayesian and frequentist interval estimation converge when data overwhelms the prior. The 95% credible interval width shrinks like
In practice, this kind of sequential stopping rule comes up in A/B testing for conversion rates. You want to stop the experiment as soon as you have enough precision, but the required sample size depends on the true conversion rate (which you do not know in advance). The worst case at $p = 0.5$ gives you a safe upper bound, but if the true rate is far from 0.5, you can stop much earlier. This is why adaptive sequential designs are valuable -- they let the data tell you when you have enough.