Birthday Problem Generalization via Poisson Approximation

Probability · Hard · Free problem

Throw $n$ balls uniformly and independently into $m$ bins. A "collision" occurs whenever two or more balls land in the same bin.

  1. Use a Poisson approximation to derive an explicit approximation for $P(\text{no collisions})$.
  2. Find $n_{0.5}$, the smallest $n$ such that $P(\text{at least one collision}) \ge 1/2$.
  3. Give the large-$m$ asymptotics of $n_{0.5}$.

Hints

  1. Count colliding pairs: for each of the $\binom{n}{2}$ pairs, the probability they share a bin is
    /m$. Sum these to get the expected number of collisions.
  2. Approximate the number of colliding pairs as $\text{Poisson}(\lambda)$ with $\lambda = n(n-1)/(2m)$; then $P(\text{no collisions}) \approx e^{-\lambda}$.
  3. Set $e^{-\lambda} = 1/2$ and solve the resulting quadratic in $n$ to get $n_{0.5}$; for large $m$ the dominant term gives $n_{0.5} \sim \sqrt{2m \ln 2}$.

Worked Solution

How to Think About It: This is the birthday problem in disguise -- $m$ days in the year, $n$ people. The classic version has $m = 365$, and most people know the answer is around $\sqrt{2 \cdot 365 \ln 2} \approx 23$. The Poisson approach makes the generalization clean: count the number of colliding pairs, approximate that as a Poisson random variable, and use the Poisson no-collision probability. The rate of the Poisson is just the expected number of colliding pairs, which is $\binom{n}{2}/m$.

Quick Estimate: For $m$ large and $n \sim \sqrt{m}$, the expected number of colliding pairs is about $\binom{n}{2}/m \approx n^2/(2m)$. For $P(\text{at least one collision}) = 1/2$ we need this rate $\approx \ln 2$, giving $n_{0.5} \approx \sqrt{2m \ln 2} \approx 1.177 \sqrt{m}$. For $m = 365$: $n_{0.5} \approx 1.177 \times 19.1 \approx 22.5$, rounding to 23. Checks out.

Approach: Chen-Stein / Poisson approximation for the number of colliding pairs.

Formal Solution:

Part 1 -- Poisson approximation for $P(\text{no collisions})$:

For each pair $(i,j)$ with $i < j$, let $A_{ij}$ be the indicator that balls $i$ and $j$ land in the same bin. Then: $P(A_{ij} = 1) = \frac{1}{m}.$

The total number of colliding pairs is $W = \sum_{i<j} A_{ij}$. The expected number of collisions is: $\lambda = E[W] = \binom{n}{2} \cdot \frac{1}{m} = \frac{n(n-1)}{2m}.$

For large $m$ and $n = o(m^{2/3})$, the indicators $A_{ij}$ are nearly independent and $W$ is approximately $\text{Poisson}(\lambda)$. A collision occurs iff $W \ge 1$, so: $P(\text{no collisions}) \approx P(W = 0) = e^{-\lambda} = \exp\!\left(-\frac{n(n-1)}{2m}\right).$

Alternatively, by exact counting: $P(\text{no collisions}) = \frac{m(m-1)\cdots(m-n+1)}{m^n} = \prod_{k=0}^{n-1}\left(1 - \frac{k}{m}\right) \approx \exp\!\left(-\sum_{k=0}^{n-1}\frac{k}{m}\right) = \exp\!\left(-\frac{n(n-1)}{2m}\right),$ confirming the Poisson approximation.

Part 2 -- Solving for $n_{0.5}$:

We want $P(\text{at least one collision}) = 1/2$, i.e., $P(\text{no collisions}) = 1/2$: $\exp\!\left(-\frac{n(n-1)}{2m}\right) = \frac{1}{2}.$

Taking logs: $\frac{n(n-1)}{2m} = \ln 2.$

Solving the quadratic $n^2 - n - 2m\ln 2 = 0$: $n_{0.5} = \frac{1 + \sqrt{1 + 8m\ln 2}}{2}.$

In practice, round up to the nearest integer.

Part 3 -- Large-$m$ asymptotics:

For large $m$, the $\sqrt{8m\ln 2}$ term dominates: $n_{0.5} \sim \frac{\sqrt{8m\ln 2}}{2} = \sqrt{2m\ln 2}.$

Numerically: $\sqrt{2\ln 2} \approx 1.1774$, so $n_{0.5} \approx 1.177\sqrt{m}$.

For $m = 365$: $n_{0.5} \approx 1.177 \times 19.1 \approx 22.5$, giving $n_{0.5} = 23$. The well-known birthday problem answer.

Answer: $P(\text{no collisions}) \approx e^{-n(n-1)/(2m)}$; the threshold is $n_{0.5} = \frac{1 + \sqrt{1 + 8m\ln 2}}{2} \approx \sqrt{2m\ln 2}$ for large $m$.

Intuition

The birthday problem feels surprising because our intuition about probability is calibrated to linear scales, not square-root scales. The threshold $n_{0.5} \sim \sqrt{2m \ln 2}$ grows like $\sqrt{m}$, not like $m$. With 365 days, you only need 23 people -- not 183 -- to have a 50% chance of a match. This $\sqrt{m}$ scaling is a ubiquitous fingerprint of collision-type problems: it shows up in hash table load factors, the birthday attack in cryptography (which is why 64-bit hashes are breakable in

^{32}$ trials, not ^{64}$), and the coupon collector's problem.

The Poisson approximation is the right tool here because collisions are rare events that happen independently across pairs. The approximation is tight when $n = o(m^{2/3})$ -- once $n$ grows faster than $m^{2/3}$, dependency among triples of balls starts to matter and the pure Poisson model breaks down. For most practical purposes (and certainly for interview settings), the approximation is excellent and gives exact leading-order asymptotics.

Open the full interactive solver →