Suppose you have a coin with unknown bias $p$, and you place a $\text{Beta}(\alpha, \beta)$ prior on $p$. You then observe $s$ successes and $f$ failures in $s + f$ independent flips. 1. What is the posterior distribution of $p$? 2. Derive the posterior predictive distribution for the number of suc…

Beta-Bernoulli Posterior and Predictive Distribution

Statistics · Medium · Free problem

Suppose you have a coin with unknown bias $p$, and you place a $\text{Beta}(\alpha, \beta)$ prior on $p$. You then observe $s$ successes and $f$ failures in $s + f$ independent flips.

What is the posterior distribution of $p$?
Derive the posterior predictive distribution for the number of successes in the next $m$ independent trials.
Compute $P(\text{next three flips are all successes} \mid \text{data})$.

Hints

The Beta distribution is conjugate to the Bernoulli likelihood. How do the prior parameters $\alpha$ and $\beta$ update after observing successes and failures?
For the posterior predictive, integrate out $p$ from the joint distribution. The integral $\int_0^1 p^{a-1}(1-p)^{b-1} dp = B(a,b)$ will appear.
For part (iii), set $m = 3$, $k = 3$ in the Beta-Binomial formula. The ratio of Beta functions simplifies to a product of rising factorials.

Worked Solution

How to Think About It: This is the canonical conjugate Bayesian model. The Beta distribution is conjugate to the Bernoulli/Binomial likelihood, meaning the posterior is also Beta -- you just update the parameters by adding the counts. The posterior predictive integrates out the unknown $p$, which turns Binomial draws into Beta-Binomial draws. The key intuition: uncertainty about $p$ makes the predictive distribution over-dispersed compared to a plain Binomial with $p$ known. Before computing anything, note that the posterior mean is $(\alpha + s)/(\alpha + \beta + s + f)$, which is a weighted average of the prior mean and the observed success rate.

Quick Estimate: Suppose $\alpha = \beta = 1$ (uniform prior), $s = 7$, $f = 3$. Posterior is $\text{Beta}(8, 4)$ with mean $8/12 = 2/3$. For part (iii), the plug-in estimate for 3 heads in a row is $(2/3)^3 \approx 0.296$. The Bayesian answer will be slightly different because it accounts for uncertainty in $p$.

Approach: Use conjugacy for the posterior, then integrate out $p$ for the predictive.

Formal Solution:

Part (i): Posterior distribution

The likelihood of observing $s$ successes and $f$ failures is: $L(p) = p^s (1-p)^f$

The prior is $\text{Beta}(\alpha, \beta)$ with density $p^{\alpha - 1}(1-p)^{\beta - 1} / B(\alpha, \beta)$.

By Bayes' rule, the posterior is proportional to: $\pi(p \mid \text{data}) \propto p^{\alpha - 1 + s} (1-p)^{\beta - 1 + f}$

This is the kernel of a Beta distribution: $p \mid \text{data} \sim \text{Beta}(\alpha + s, \beta + f)$

Part (ii): Posterior predictive for $m$ future trials

Let $Y$ be the number of successes in $m$ future trials. Conditional on $p$, $Y \sim \text{Binomial}(m, p)$. The posterior predictive is: $P(Y = k \mid \text{data}) = \int_0^1 \binom{m}{k} p^k (1-p)^{m-k} \cdot \frac{p^{\alpha' - 1}(1-p)^{\beta' - 1}}{B(\alpha', \beta')} \, dp$

where $\alpha' = \alpha + s$ and $\beta' = \beta + f$.

Pulling the constant outside and combining exponents: $P(Y = k \mid \text{data}) = \binom{m}{k} \frac{1}{B(\alpha', \beta')} \int_0^1 p^{\alpha' + k - 1}(1-p)^{\beta' + m - k - 1} \, dp$

The integral is $B(\alpha' + k, \beta' + m - k)$, so: $P(Y = k \mid \text{data}) = \binom{m}{k} \frac{B(\alpha' + k, \beta' + m - k)}{B(\alpha', \beta')}$

This is the Beta-Binomial distribution: $Y \mid \text{data} \sim \text{Beta-Binomial}(m, \alpha + s, \beta + f)$.

Part (iii): All three future flips are successes

Set $m = 3$, $k = 3$: $P(Y = 3 \mid \text{data}) = \binom{3}{3} \frac{B(\alpha' + 3, \beta')}{B(\alpha', \beta')}$

Using $B(a, b) = \Gamma(a)\Gamma(b)/\Gamma(a+b)$: $P(Y = 3 \mid \text{data}) = \frac{\Gamma(\alpha' + 3) \, \Gamma(\alpha' + \beta')}{\Gamma(\alpha') \, \Gamma(\alpha' + \beta' + 3)}$

Expanding with $\Gamma(n+1) = n \cdot \Gamma(n)$: $P(Y = 3 \mid \text{data}) = \frac{\alpha'(\alpha' + 1)(\alpha' + 2)}{(\alpha' + \beta')(\alpha' + \beta' + 1)(\alpha' + \beta' + 2)}$

Substituting $\alpha' = \alpha + s$, $\beta' = \beta + f$, and letting $n = \alpha + \beta + s + f$:

$\boxed{P(\text{next 3 all successes}) = \frac{(\alpha + s)(\alpha + s + 1)(\alpha + s + 2)}{n(n+1)(n+2)}}$

Sanity check: With uniform prior ($\alpha = \beta = 1$), $s = 7$, $f = 3$: $\alpha' = 8$, $n = 12$. Answer is $\frac{8 \cdot 9 \cdot 10}{12 \cdot 13 \cdot 14} = \frac{720}{2184} \approx 0.330$. The plug-in estimate was $(8/12)^3 = 0.296$. The Bayesian answer is higher because uncertainty about $p$ adds positive probability mass in the region where $p$ is large, which disproportionately boosts the probability of consecutive successes.

Answer: (i) $p \mid \text{data} \sim \text{Beta}(\alpha + s, \beta + f)$. (ii) $Y \sim \text{Beta-Binomial}(m, \alpha + s, \beta + f)$. (iii) $P = \frac{(\alpha+s)(\alpha+s+1)(\alpha+s+2)}{n(n+1)(n+2)}$ where $n = \alpha + \beta + s + f$.

Intuition

The Beta-Bernoulli model is the foundation of Bayesian thinking in quant finance. The posterior $\text{Beta}(\alpha + s, \beta + f)$ has a beautiful interpretation: $\alpha$ and $\beta$ are "pseudo-counts" -- they represent your prior belief expressed as if you had already seen $\alpha - 1$ successes and $\beta - 1$ failures. The data just adds more counts. The posterior mean $(\alpha + s)/(\alpha + \beta + s + f)$ is a weighted average of the prior mean and the sample mean, with weights proportional to the "sample sizes" ($\alpha + \beta$ for the prior, $s + f$ for the data).

The posterior predictive being Beta-Binomial rather than plain Binomial captures a crucial practical effect: parameter uncertainty inflates variance. If you use the plug-in estimate $\hat{p}$ and treat it as known, you underestimate the tail probabilities. This matters in market making (where you price contracts on future outcomes), risk management (where tail events drive capital), and any setting where you are making predictions with limited data. The Bayesian approach naturally hedges against overconfidence in your point estimate.

Open the full interactive solver →