Bayesian Posterior for a Bernoulli Parameter

Statistics · Easy · Free problem

Let $X \sim \text{Bernoulli}(\theta)$, where the parameter $\theta$ has a uniform prior on $[0, 1]$ (equivalently, $\theta \sim \text{Beta}(1, 1)$).

You observe $X = 1$.

  1. Derive the posterior distribution $P(\theta \mid X = 1)$.
  1. What is the posterior mean $E[\theta \mid X = 1]$?
  1. How does this generalize if you observe $k$ successes in $n$ independent Bernoulli trials with the same $\text{Beta}(\alpha, \beta)$ prior?

Hints

  1. The uniform distribution on $[0,1]$ is the same as $\text{Beta}(1,1)$. The Beta family is conjugate to the Bernoulli likelihood -- what does that tell you about the form of the posterior?
  2. Apply Bayes' theorem: $f(\theta \mid X=1) \propto P(X=1 \mid \theta) \cdot f(\theta) = \theta \cdot 1 = \theta$. What distribution has a density proportional to $\theta$ on $[0,1]$?
  3. Normalize: $\int_0^1 \theta \, d\theta = 1/2$, so the posterior density is
\theta$, which is $\text{Beta}(2,1)$.

Worked Solution

How to Think About It: This is the simplest possible Bayesian updating problem, and it's worth internalizing completely because every more complex Bayesian calculation follows the same pattern. You have uncertainty about a parameter $\theta$ (expressed as a prior), you observe data that depends on $\theta$ (the likelihood), and you combine them via Bayes' theorem to get a posterior. The Beta-Bernoulli pair is the canonical example because the math works out perfectly -- the Beta prior is conjugate to the Bernoulli likelihood, so the posterior is also Beta.

Quick Estimate: Before any formal work: $\theta$ starts uniform on $[0, 1]$ with mean

/2$. After seeing a success, we should shift our belief upward. Higher values of $\theta$ are more likely to have produced $X = 1$, so the posterior should put more weight on large $\theta$. The posterior mean should be above
/2$ -- intuitively around
/3$ (one success out of one trial, plus the "pseudo-count" from the uniform prior).

Approach: Apply Bayes' theorem with a continuous prior.

Formal Solution:

Part 1: Posterior derivation.

Prior: $f(\theta) = 1$ for $\theta \in [0, 1]$ (uniform = $\text{Beta}(1,1)$).

Likelihood: $P(X = 1 \mid \theta) = \theta$.

By Bayes' theorem:

$f(\theta \mid X = 1) = \frac{P(X=1 \mid \theta) \cdot f(\theta)}{P(X=1)}$

The normalizing constant:

$P(X = 1) = \int_0^1 \theta \cdot 1 \, d\theta = \frac{1}{2}$

So:

$f(\theta \mid X = 1) = \frac{\theta}{1/2} = 2\theta, \quad \theta \in [0, 1]$

This is a $\text{Beta}(2, 1)$ distribution. We can verify: the $\text{Beta}(a, b)$ density is $\frac{\theta^{a-1}(1-\theta)^{b-1}}{B(a,b)}$. For $a = 2, b = 1$: $\frac{\theta^1 \cdot 1}{B(2,1)} = \frac{\theta}{1/2} = 2\theta$. Checks out.

Part 2: Posterior mean.

$E[\theta \mid X = 1] = \frac{\alpha'}{\alpha' + \beta'} = \frac{2}{2 + 1} = \frac{2}{3}$

This confirms our quick estimate. The posterior mean

/3$ is above the prior mean
/2$, pulled upward by the observed success.

Part 3: General formula.

With a $\text{Beta}(\alpha, \beta)$ prior and $k$ successes in $n$ trials:

$\theta \mid \text{data} \sim \text{Beta}(\alpha + k, \, \beta + n - k)$

The posterior mean is:

$E[\theta \mid \text{data}] = \frac{\alpha + k}{\alpha + \beta + n}$

This can be rewritten as a weighted average of the prior mean and the MLE:

$E[\theta \mid \text{data}] = \frac{\alpha + \beta}{\alpha + \beta + n} \cdot \underbrace{\frac{\alpha}{\alpha + \beta}}_{\text{prior mean}} + \frac{n}{\alpha + \beta + n} \cdot \underbrace{\frac{k}{n}}_{\text{MLE}}$

As $n \to \infty$, the weight on the prior goes to zero and the posterior concentrates around the MLE. The prior parameters $\alpha$ and $\beta$ act as "pseudo-observations" -- $\alpha$ prior successes and $\beta$ prior failures.

For our specific problem: $\alpha = 1, \beta = 1, k = 1, n = 1$, giving $\text{Beta}(2, 1)$ with mean

/3$.

Answer: The posterior is $\theta \mid X = 1 \sim \text{Beta}(2, 1)$ with density $f(\theta \mid X=1) = 2\theta$ on $[0,1]$. The posterior mean is /3$. In general, the posterior after $k$ successes in $n$ trials with a $\text{Beta}(\alpha, \beta)$ prior is $\text{Beta}(\alpha + k, \beta + n - k)$.

Intuition

The Beta-Bernoulli conjugacy is the building block of Bayesian inference. The reason it matters so much in practice is the "pseudo-observation" interpretation: your prior $\text{Beta}(\alpha, \beta)$ is equivalent to having already seen $\alpha - 1$ successes and $\beta - 1$ failures before any real data arrives. A uniform prior ($\alpha = \beta = 1$) means zero pseudo-observations -- you're maximally ignorant. After seeing one success, you update to $\text{Beta}(2, 1)$, as if you'd seen one success and zero failures. The posterior mean /3$ is a shrinkage estimate -- pulled toward

/2$ relative to the MLE of
/1 = 1$ by the prior.

This exact framework shows up in market making. If you're pricing a contract on whether an event occurs (e.g., "will the next trade be a buy?"), and you start with a $\text{Beta}(1,1)$ prior, each observed outcome updates your posterior. Your fair price is the posterior mean, and your spread reflects your posterior uncertainty (which shrinks as

/\sqrt{n}$ with more observations). The Beta-Binomial model is the simplest version of the Bayesian market-making framework that firms like Jane Street and SIG use in their training programs.

Open the full interactive solver →