Coupon Collector With Unequal Types: Playing Cards

Expectation · Hard · Free problem

Draw cards one at a time, without replacement, from a well-shuffled standard 52-card deck (4 suits, 13 cards per suit). Stop as soon as you have seen at least one card from every suit.

  1. Derive the exact expected number of draws $E[N]$.
  1. Give a tight closed-form expression using inclusion-exclusion.
  1. Compare your answer to the naive with-replacement (multinomial) heuristic, which treats each draw as an independent uniform choice among 4 equally likely suits. How large is the error, and why does it go in the direction it does?

Hints

  1. Write $E[N] = \sum_{k \ge 0} P(N > k)$ and use inclusion-exclusion to compute $P(N > k)$, the probability that the first $k$ cards miss at least one suit.
  2. You will need the identity $\sum_{k=0}^{m} \binom{m}{k}/\binom{n}{k} = (n+1)/(n-m+1)$. This collapses the inner sums into simple fractions.
  3. Substitute $n = 52$ and $m = 52 - 13j$ for $j = 1,2,3,4$ to get $E[N] = 53(2/7 - 2/9 + 1/10 - 1/53)$. For the comparison, the with-replacement coupon collector gives $4 H_4 = 25/3$.

Worked Solution

How to Think About It: This is the coupon collector problem, but sampling without replacement from a finite deck instead of the usual i.i.d. setting. The key difference: every time you draw a card from a suit you have already seen, you are depleting the deck of "useless" cards. That makes it progressively easier to find the remaining suits compared to the with-replacement version, so we expect the without-replacement answer to be strictly smaller. A quick way to ballpark: the with-replacement coupon collector with 4 types needs $4 H_4 = 4(1 + 1/2 + 1/3 + 1/4) = 25/3 \approx 8.33$ draws on average. Without replacement should be noticeably less -- maybe around 7.5 to 8.

Quick Estimate: The bottleneck is the last suit. Once you have seen 3 of 4 suits, the missing suit has 13 cards left in a deck of roughly $52 - 5 = 47$ cards (since you typically need about 5 draws to get 3 suits). So the expected additional draws for the last suit is roughly $47/13 \approx 3.6$. Adding the roughly 4 draws to see the first 3 suits gives $\approx 7.6$. This is close to the exact answer of $\approx 7.665$.

Approach: Use the tail-sum formula $E[N] = \sum_{k=0}^{51} P(N > k)$ and compute $P(N > k)$ -- the probability that the first $k$ cards miss at least one suit -- via inclusion-exclusion.

Formal Solution:

We need $P(N > k)$, the probability that the first $k$ cards drawn (without replacement from 52) do not contain all 4 suits. By inclusion-exclusion over which suits are missing:

$P(N > k) = \sum_{j=1}^{4} (-1)^{j+1} \binom{4}{j} \frac{\binom{52 - 13j}{k}}{\binom{52}{k}}$

where $\binom{m}{k} = 0$ when $k > m$. The term $\binom{52 - 13j}{k}/\binom{52}{k}$ is the probability that all $k$ cards come from the $52 - 13j$ cards outside $j$ specific suits.

Summing over $k$:

$E[N] = \sum_{k=0}^{51} P(N > k) = \sum_{j=1}^{4} (-1)^{j+1} \binom{4}{j} \sum_{k=0}^{52-13j} \frac{\binom{52-13j}{k}}{\binom{52}{k}}$

Now apply the identity: for integers $0 \le m \le n$,

$\sum_{k=0}^{m} \frac{\binom{m}{k}}{\binom{n}{k}} = \frac{n+1}{n - m + 1}$

With $n = 52$ and $m = 52 - 13j$, each inner sum equals $53/(13j + 1)$. Substituting:

$E[N] = \binom{4}{1}\frac{53}{14} - \binom{4}{2}\frac{53}{27} + \binom{4}{3}\frac{53}{40} - \binom{4}{4}\frac{53}{53}$

$= 53\left(\frac{4}{14} - \frac{6}{27} + \frac{4}{40} - \frac{1}{53}\right) = 53\left(\frac{2}{7} - \frac{2}{9} + \frac{1}{10} - \frac{1}{53}\right)$

Putting everything over the common denominator $33390 = \text{lcm}(7,9,10,53)$:

$E[N] = 53 \cdot \frac{9540 - 7420 + 3339 - 630}{33390} = 53 \cdot \frac{4829}{33390} = \frac{4829}{630} \approx 7.665$

Comparison with the with-replacement heuristic:

The standard coupon collector (with replacement, 4 equally likely types) gives $E_{\text{wr}} = 4 H_4 = 25/3 \approx 8.333$. The exact without-replacement answer $4829/630 \approx 7.665$ is lower by $421/630 \approx 0.668$, an overestimate of about $8.7\%$.

This makes intuitive sense: in the with-replacement model, drawing a duplicate card does nothing -- it is a wasted draw. In the without-replacement model, every duplicate draw removes a "useless" card from the deck, increasing the fraction of remaining cards that belong to unseen suits. This depletion effect accelerates the collection process, so the without-replacement collector finishes faster.

Answer:

$E[N] = 53\left(\frac{2}{7} - \frac{2}{9} + \frac{1}{10} - \frac{1}{53}\right) = \frac{4829}{630} \approx 7.665$

The with-replacement heuristic (

5/3 \approx 8.333$) overestimates by about $8.7\%$ because it ignores the deck-depletion effect that makes without-replacement collection faster.

Intuition

The coupon collector problem is one of the most natural probability questions in combinatorics, and this variant highlights an important subtlety: sampling without replacement from a finite population is fundamentally different from the i.i.d. version. The key mechanism is deck depletion -- every card you draw, even a "useless" duplicate, shrinks the pool of remaining cards and increases the hit rate for unseen suits. This is why the without-replacement collector always finishes faster than its with-replacement counterpart. The gap grows with the number of types and shrinks as the deck size grows relative to the number of types (in the limit of an infinite deck, the two models coincide).

In quant work, this distinction shows up whenever you are sampling from a finite population without replacement -- quality control, survey design, or modeling the probability of seeing all members of some category in a finite dataset. The inclusion-exclusion machinery combined with the binomial-coefficient-ratio identity is a powerful tool that converts a seemingly complex without-replacement problem into a clean closed-form expression. The common mistake is to use the with-replacement coupon collector formula as if it were exact; it is a decent first approximation but systematically overshoots, and the error can be material when the deck is not much larger than the number of types.

Open the full interactive solver →