Expected Rolls with a Fair Die
You have a fair six-sided die. Answer the following three questions:
- What is the expected number of rolls to get the first 6?
- What is the expected number of rolls until every face (1 through 6) has appeared at least once?
- What is the expected number of rolls until every face has appeared at least twice? (You may leave your answer as a sum or provide a numerical approximation.)
Hints
- Each part is about the waiting time until a collection criterion is met. What distribution describes the time until a single success?
- For part 2, decompose the process into phases: once you have $k$ distinct faces, the chance of seeing a new one is $(6 - k)/6$. Each phase is a geometric waiting time.
- For part 3, a simple phase decomposition does not work because progress has two dimensions (new faces vs. duplicates). Set up a Markov chain with state $(a, b)$ tracking how many faces have been seen exactly once versus at least twice.
Worked Solution
How to Think About It: All three parts are about waiting times -- how long until some collection criterion is met. Part 1 is pure geometric. Part 2 is the classic coupon collector problem. Part 3 is the generalized coupon collector with multiplicity 2. The key mental model is the same throughout: break the waiting time into phases where the probability of "making progress" is constant within each phase, then add up the expected phase lengths.
Quick Estimate: Part 1 is obviously 6. Part 2: you collect the first face instantly (1 roll), the second face takes $6/5$ rolls on average, the third $6/4$, etc. The harmonic sum
Approach: Geometric waiting times for part 1, coupon collector decomposition for part 2, Markov chain for part 3.
Formal Solution:
Part 1: First 6
Each roll lands on 6 with probability
$E[T_1] = \frac{1}{1/6} = 6$
Part 2: Every face at least once (Coupon Collector)
After collecting $k$ distinct faces, the probability of seeing a new face on the next roll is $(6 - k)/6$. The waiting time in phase $k$ (going from $k$ to $k+1$ distinct faces) is geometric with parameter $(6-k)/6$, so it takes $6/(6-k)$ rolls in expectation.
Summing over all phases:
$E[T_2] = \sum_{k=0}^{5} \frac{6}{6-k} = 6\left(\frac{1}{6} + \frac{1}{5} + \frac{1}{4} + \frac{1}{3} + \frac{1}{2} + 1\right) = 6 \cdot H_6$
where $H_6 = 1 + 1/2 + 1/3 + 1/4 + 1/5 + 1/6 = 49/20$. So:
$E[T_2] = 6 \times \frac{49}{20} = \frac{147}{10} = 14.7$
Part 3: Every face at least twice (Generalized Coupon Collector)
This is harder because the phases no longer have a clean decomposition -- when you collect a new face for the first time versus promoting a face from "seen once" to "seen twice" both represent different kinds of progress, and the transition probabilities depend on the full state.
Define the state as $(a, b)$, where $a$ is the number of faces seen exactly once and $b$ is the number of faces seen at least twice. The number of unseen faces is $c = 6 - a - b$. We are done when $b = 6$.
From state $(a, b)$, each roll produces: - An unseen face (prob $c/6$): move to $(a+1, b)$ - A face seen exactly once (prob $a/6$): move to $(a-1, b+1)$ - A face seen $\geq 2$ times (prob $b/6$): stay at $(a, b)$ -- wasted roll
Let $E(a,b)$ be the expected number of rolls to reach $(0, 6)$. Then:
$E(a,b) = \frac{6}{6-b} + \frac{c}{6-b}\, E(a+1, b) + \frac{a}{6-b}\, E(a-1, b+1)$
with base case $E(0, 6) = 0$. Solving this system recursively starting from $E(0, 0)$ gives:
$E[T_3] = E(0,0) \approx 24.13$
To build intuition for this number, note that $E[T_3] - E[T_2] \approx 24.13 - 14.70 = 9.43$. The "second collection" takes fewer rolls than the first because many faces already have duplicates by the time the first collection completes.
Answer:
- First 6: $E[T_1] = 6$
- Every face once: $E[T_2] = 14.7$
- Every face twice: $E[T_3] \approx 24.13$
Intuition
The coupon collector problem is one of the most frequently tested probability results in quant interviews because it captures a pattern that shows up everywhere: diminishing returns from random sampling. The first few coupons come quickly, but the last one takes forever -- the expected time for the final coupon alone is $n$ rolls (equal to the time for the very first one). This is why the harmonic series appears: you are summing geometric waiting times with shrinking success probabilities.
The generalized version (collecting each coupon $m$ times) is a natural extension and tests whether you can set up a proper Markov chain when the simple phase decomposition breaks down. The key insight is that once simple phases fail, you need to track the full state -- here, the pair $(a, b)$ counting faces by their multiplicity. This "state space thinking" is a core skill for any stochastic modeling problem in quant finance, from inventory management to order book dynamics.