Confidence Interval with Zero Successes
A coin is tossed $N$ times and you observe zero heads. You want to build a confidence interval for the true probability of heads $p$.
- Using a frequentist approach, derive the exact (Clopper-Pearson) 95% confidence interval for $p$. Show how the "Rule of 3" approximation $p_U \approx 3/N$ arises.
- Using a Bayesian approach with a $\text{Beta}(1,1)$ prior, derive the 95% credible interval for $p$.
- Compare the two intervals. For what values of $N$ do they give meaningfully different answers?
Hints
- Think about what value of $p$ would make observing zero heads a borderline event at the 5% level.
- For the frequentist bound, invert the equation $(1-p)^N = 0.05$. For the Rule of 3, use the approximation $\ln(1-p) \approx -p$ for small $p$.
- For the Bayesian approach, recall that a $\text{Beta}(1,1)$ prior with $0$ successes and $N$ failures gives a $\text{Beta}(1, N+1)$ posterior, whose quantile function has a clean closed form.
Worked Solution
How to Think About It: You ran $N$ trials and saw nothing. That does not mean $p = 0$ -- it means $p$ is small enough that seeing zero heads in $N$ flips is not surprising. The question is: how small? A practitioner's first instinct should be the Rule of 3 -- if you see zero events in $N$ trials, the 95% upper bound on the rate is roughly $3/N$. This is the single most useful quick formula in rare-event estimation. The Bayesian version gives almost the same answer once $N$ is moderate.
Quick Estimate: Suppose $N = 100$. The Rule of 3 gives $p_U \approx 3/100 = 0.03$. The exact Clopper-Pearson bound is
Approach: We derive the exact frequentist interval by inverting the binomial probability, then show the Bayesian credible interval from the Beta posterior.
Formal Solution:
*Frequentist (Clopper-Pearson):* With $X = 0$ heads in $N$ flips, the likelihood is $L(p) = (1-p)^N$. The 95% confidence interval has $p_L = 0$ (since $X = 0$ is most likely when $p = 0$). The upper bound $p_U$ is the value of $p$ such that $P(X = 0 \mid p) = \alpha$, i.e.,
$(1 - p_U)^N = 0.05$
$p_U = 1 - 0.05^{1/N}$
For the Rule of 3 approximation, take a log: $\ln(1 - p_U) = \ln(0.05)/N \approx -3/N$ for $\alpha = 0.05$. When $p_U$ is small, $\ln(1 - p_U) \approx -p_U$, so $p_U \approx 3/N$. More precisely, $-\ln(0.05) = 2.996$, which is why the magic number is 3.
*Bayesian:* With prior $p \sim \text{Beta}(1,1)$ and data $X = 0$ in $N$ flips, the posterior is $p \mid X = 0 \sim \text{Beta}(1, N+1)$. The posterior mean is
$1 - (1 - q)^{N+1} = 0.95$
$q_{0.95} = 1 - 0.05^{1/(N+1)}$
This is almost identical to the frequentist bound but with $N+1$ in place of $N$.
*Comparison:* The frequentist bound is
Answer: The 95% frequentist CI is $[0, \; 1 - 0.05^{1/N}]$, well-approximated by $[0, \; 3/N]$. The Bayesian 95% credible interval with a flat prior is $[0, \; 1 - 0.05^{1/(N+1)}]$. Both intervals shrink as $O(1/N)$ and are nearly identical for moderate $N$.
Intuition
The Rule of 3 is one of the most practical tools in rare-event statistics. Whenever you run $N$ trials and see zero occurrences, you can say with about 95% confidence that the true rate is at most $3/N$. The derivation is just one line of algebra, but the intuition is even simpler: if the true rate were much above $3/N$, seeing zero in $N$ trials would be quite unlikely (below 5%). This shows up constantly in operational risk, quality control, and clinical trials -- anywhere you need to bound the probability of something you have never observed.
The near-agreement between frequentist and Bayesian answers here is not a coincidence. With a non-informative prior and a decent sample size, Bayesian credible intervals and frequentist confidence intervals often coincide numerically. The conceptual difference -- "the parameter is fixed and the interval is random" vs. "the parameter has a posterior distribution" -- matters philosophically but rarely matters in practice for well-behaved problems like this one.