Confidence Interval with Zero Successes

Statistics · Medium · Free problem

A coin is tossed $N$ times and you observe zero heads. You want to build a confidence interval for the true probability of heads $p$.

  1. Using a frequentist approach, derive the exact (Clopper-Pearson) 95% confidence interval for $p$. Show how the "Rule of 3" approximation $p_U \approx 3/N$ arises.
  1. Using a Bayesian approach with a $\text{Beta}(1,1)$ prior, derive the 95% credible interval for $p$.
  1. Compare the two intervals. For what values of $N$ do they give meaningfully different answers?

Hints

  1. Think about what value of $p$ would make observing zero heads a borderline event at the 5% level.
  2. For the frequentist bound, invert the equation $(1-p)^N = 0.05$. For the Rule of 3, use the approximation $\ln(1-p) \approx -p$ for small $p$.
  3. For the Bayesian approach, recall that a $\text{Beta}(1,1)$ prior with $0$ successes and $N$ failures gives a $\text{Beta}(1, N+1)$ posterior, whose quantile function has a clean closed form.

Worked Solution

How to Think About It: You ran $N$ trials and saw nothing. That does not mean $p = 0$ -- it means $p$ is small enough that seeing zero heads in $N$ flips is not surprising. The question is: how small? A practitioner's first instinct should be the Rule of 3 -- if you see zero events in $N$ trials, the 95% upper bound on the rate is roughly $3/N$. This is the single most useful quick formula in rare-event estimation. The Bayesian version gives almost the same answer once $N$ is moderate.

Quick Estimate: Suppose $N = 100$. The Rule of 3 gives $p_U \approx 3/100 = 0.03$. The exact Clopper-Pearson bound is

- 0.05^{1/100} \approx 0.0295$. The Bayesian bound with a flat prior is
- 0.05^{1/101} \approx 0.0292$. All three are within a whisker of each other. For $N = 10$, the Rule of 3 gives $0.30$, the exact bound gives
- 0.05^{0.1} \approx 0.259$, and the gap is larger -- the approximation is cruder for small $N$.

Approach: We derive the exact frequentist interval by inverting the binomial probability, then show the Bayesian credible interval from the Beta posterior.

Formal Solution:

*Frequentist (Clopper-Pearson):* With $X = 0$ heads in $N$ flips, the likelihood is $L(p) = (1-p)^N$. The 95% confidence interval has $p_L = 0$ (since $X = 0$ is most likely when $p = 0$). The upper bound $p_U$ is the value of $p$ such that $P(X = 0 \mid p) = \alpha$, i.e.,

$(1 - p_U)^N = 0.05$

$p_U = 1 - 0.05^{1/N}$

For the Rule of 3 approximation, take a log: $\ln(1 - p_U) = \ln(0.05)/N \approx -3/N$ for $\alpha = 0.05$. When $p_U$ is small, $\ln(1 - p_U) \approx -p_U$, so $p_U \approx 3/N$. More precisely, $-\ln(0.05) = 2.996$, which is why the magic number is 3.

*Bayesian:* With prior $p \sim \text{Beta}(1,1)$ and data $X = 0$ in $N$ flips, the posterior is $p \mid X = 0 \sim \text{Beta}(1, N+1)$. The posterior mean is

/(N+2)$. The 95% highest-density credible interval is $[0, q_{0.95}]$ where $q_{0.95}$ solves

$1 - (1 - q)^{N+1} = 0.95$

$q_{0.95} = 1 - 0.05^{1/(N+1)}$

This is almost identical to the frequentist bound but with $N+1$ in place of $N$.

*Comparison:* The frequentist bound is

- 0.05^{1/N}$ and the Bayesian bound is
- 0.05^{1/(N+1)}$. The ratio of exponents is $N/(N+1) \to 1$ as $N \to \infty$, so the intervals converge rapidly. For $N \geq 30$, the difference is negligible. For very small $N$ (say $N = 3$), the Bayesian interval is noticeably tighter because the prior carries more weight.

Answer: The 95% frequentist CI is $[0, \; 1 - 0.05^{1/N}]$, well-approximated by $[0, \; 3/N]$. The Bayesian 95% credible interval with a flat prior is $[0, \; 1 - 0.05^{1/(N+1)}]$. Both intervals shrink as $O(1/N)$ and are nearly identical for moderate $N$.

Intuition

The Rule of 3 is one of the most practical tools in rare-event statistics. Whenever you run $N$ trials and see zero occurrences, you can say with about 95% confidence that the true rate is at most $3/N$. The derivation is just one line of algebra, but the intuition is even simpler: if the true rate were much above $3/N$, seeing zero in $N$ trials would be quite unlikely (below 5%). This shows up constantly in operational risk, quality control, and clinical trials -- anywhere you need to bound the probability of something you have never observed.

The near-agreement between frequentist and Bayesian answers here is not a coincidence. With a non-informative prior and a decent sample size, Bayesian credible intervals and frequentist confidence intervals often coincide numerically. The conceptual difference -- "the parameter is fixed and the interval is random" vs. "the parameter has a posterior distribution" -- matters philosophically but rarely matters in practice for well-behaved problems like this one.

Open the full interactive solver →