Bayesian Credible Interval for Coin Bias

Statistics · Medium · Free problem

A coin has an unknown probability $p$ of landing heads. You start with a $\text{Beta}(a, b)$ prior on $p$, then observe $H$ heads and $T$ tails.

  1. Derive the posterior distribution for $p$.
  1. Explain how to construct a 95% equal-tailed credible interval from the posterior.
  1. Specialize to $a = b = 1$ (uniform prior), $H = 530$, $T = 470$. Report the posterior, compute the 95% credible interval numerically, and compare it with the interval you get from a normal approximation to the posterior.

Hints

  1. The Beta distribution is conjugate to the Binomial likelihood. What does the posterior look like after updating?
  2. The equal-tailed credible interval uses the $\alpha/2$ and
    - \alpha/2$ quantiles of the posterior Beta distribution.
  3. With $a = b = 1$, the posterior is $\text{Beta}(531, 471)$. For the normal approximation, match its mean and variance: $\hat{p} = 531/1002$ and $\sigma = \sqrt{\hat{p}(1-\hat{p})/1002}$.

Worked Solution

How to Think About It: This is one of the cleanest Bayesian inference setups you will ever see. The Beta-Binomial conjugate pair means the posterior is available in closed form -- no MCMC, no approximation needed. The real question is whether you understand what a credible interval is (it is a probability statement about the parameter, not about long-run coverage) and whether you can sanity-check your answer with a quick normal approximation. With 1,000 flips and a flat prior, the prior barely matters, and the posterior should look nearly Gaussian. If your exact and approximate intervals differ by more than a hair, something is wrong.

Quick Estimate: With $H = 530$, $T = 470$, the posterior mean is roughly $530/1000 = 0.53$. The posterior standard deviation is roughly $\sqrt{0.53 \times 0.47 / 1000} \approx 0.0158$. A 95% interval is about $0.53 \pm 2 \times 0.016 = (0.498, 0.562)$. So we expect the coin is biased toward heads, but only slightly -- the interval barely excludes 0.5.

Approach: Use Beta-Binomial conjugacy for the exact posterior, then extract quantiles for the credible interval. Compare with a Gaussian approximation.

Formal Solution:

Part 1 -- Posterior derivation.

The likelihood of observing $H$ heads and $T$ tails given bias $p$ is:

$L(p) \propto p^H (1-p)^T$

The prior is $p \sim \text{Beta}(a, b)$, with density $\pi(p) \propto p^{a-1}(1-p)^{b-1}$. By Bayes' theorem:

$\pi(p \mid \text{data}) \propto p^{a + H - 1}(1-p)^{b + T - 1}$

This is the kernel of a Beta distribution, so:

$p \mid \text{data} \sim \text{Beta}(a + H,\; b + T)$

Conjugacy gives us the posterior for free: just add the head count to $a$ and the tail count to $b$.

Part 2 -- Constructing the 95% equal-tailed credible interval.

An equal-tailed

00(1-\alpha)\%$ credible interval $[L, U]$ satisfies:

$P(p < L \mid \text{data}) = \frac{\alpha}{2}, \qquad P(p > U \mid \text{data}) = \frac{\alpha}{2}$

Equivalently, $L$ and $U$ are the $\alpha/2$ and

- \alpha/2$ quantiles of the $\text{Beta}(a+H, b+T)$ distribution. For $\alpha = 0.05$, these are the 0.025 and 0.975 quantiles. In practice you look these up numerically (e.g., scipy.stats.beta.ppf or qbeta in R).

Part 3 -- Numerical computation for $a = b = 1$, $H = 530$, $T = 470$.

The posterior is $\text{Beta}(531, 471)$. Its moments:

  • Posterior mean: $\displaystyle\hat{p} = \frac{531}{531 + 471} = \frac{531}{1002} \approx 0.5299$
  • Posterior variance: $\displaystyle\sigma^2 = \frac{531 \times 471}{1002^2 \times 1003} \approx 2.490 \times 10^{-4}$
  • Posterior std: $\sigma \approx 0.01578$

Exact credible interval: The 0.025 and 0.975 quantiles of $\text{Beta}(531, 471)$ are approximately:

$[L, U] \approx [0.4991, 0.5605]$

Normal approximation: Approximate the posterior as $p \mid \text{data} \approx N(0.5299, 0.01578^2)$. Then:

$[L, U] \approx 0.5299 \pm 1.96 \times 0.01578 = [0.4990, 0.5608]$

Comparison: The two intervals agree to three decimal places. This is expected: with $n = 1000$ observations, the Beta posterior is extremely well approximated by a Gaussian (a consequence of the Bernstein-von Mises theorem). The slight difference comes from the Beta's tiny residual skewness -- $\text{skew} = 2(b' - a')/(a' + b' + 2) \cdot 1/\sqrt{a' + b'}$ where $a' = 531, b' = 471$, which evaluates to about $-0.0019$. This is negligible.

Note that the uniform prior $\text{Beta}(1,1)$ has almost no effect. The data dominates: adding 1 pseudo-head and 1 pseudo-tail to 1,000 observations shifts the mean by less than $0.001$.

Answer: The posterior is $\text{Beta}(a + H, b + T)$. For $a = b = 1$, $H = 530$, $T = 470$: posterior is $\text{Beta}(531, 471)$ with mean $\approx 0.530$ and the 95% equal-tailed credible interval is approximately $[0.499, 0.561]$. The normal approximation gives essentially the same interval, confirming that with 1,000 observations the posterior is nearly Gaussian.

Intuition

This problem is the bread and butter of Bayesian inference: conjugate updating followed by interval estimation. The key lesson is that with enough data, the prior washes out and the posterior concentrates around the MLE. With 1,000 coin flips, whether you started with a uniform prior, a $\text{Beta}(2,2)$, or even a mildly informative $\text{Beta}(10,10)$, the posterior mean barely moves. The credible interval width scales like

/\sqrt{n}$, just like a frequentist confidence interval -- and for large $n$ the two intervals are numerically indistinguishable. This is the Bernstein-von Mises theorem in action.

In practice, the Bayesian credible interval has a cleaner interpretation: there is a 95% probability that $p$ lies in this interval, given the data and prior. The frequentist confidence interval says something different (about long-run coverage). For trading applications -- say you are making a market on a binary event -- the posterior is your pricing distribution. The credible interval tells you how wide your uncertainty band is, which directly maps to how wide you should quote. When the interval is tight (as it is here with 1,000 observations), you can quote tight. When it is wide, you need more edge to justify a position.

Open the full interactive solver →