Birthday Problem Generalization via Poisson Approximation
Throw $n$ balls uniformly and independently into $m$ bins. A "collision" occurs whenever two or more balls land in the same bin.
- Use a Poisson approximation to derive an explicit approximation for $P(\text{no collisions})$.
- Find $n_{0.5}$, the smallest $n$ such that $P(\text{at least one collision}) \ge 1/2$.
- Give the large-$m$ asymptotics of $n_{0.5}$.
Hints
- Count colliding pairs: for each of the $\binom{n}{2}$ pairs, the probability they share a bin is /m$. Sum these to get the expected number of collisions.^{32}$ trials, not
- Approximate the number of colliding pairs as $\text{Poisson}(\lambda)$ with $\lambda = n(n-1)/(2m)$; then $P(\text{no collisions}) \approx e^{-\lambda}$.
- Set $e^{-\lambda} = 1/2$ and solve the resulting quadratic in $n$ to get $n_{0.5}$; for large $m$ the dominant term gives $n_{0.5} \sim \sqrt{2m \ln 2}$.
Worked Solution
How to Think About It: This is the birthday problem in disguise -- $m$ days in the year, $n$ people. The classic version has $m = 365$, and most people know the answer is around $\sqrt{2 \cdot 365 \ln 2} \approx 23$. The Poisson approach makes the generalization clean: count the number of colliding pairs, approximate that as a Poisson random variable, and use the Poisson no-collision probability. The rate of the Poisson is just the expected number of colliding pairs, which is $\binom{n}{2}/m$.
Quick Estimate: For $m$ large and $n \sim \sqrt{m}$, the expected number of colliding pairs is about $\binom{n}{2}/m \approx n^2/(2m)$. For $P(\text{at least one collision}) = 1/2$ we need this rate $\approx \ln 2$, giving $n_{0.5} \approx \sqrt{2m \ln 2} \approx 1.177 \sqrt{m}$. For $m = 365$: $n_{0.5} \approx 1.177 \times 19.1 \approx 22.5$, rounding to 23. Checks out.
Approach: Chen-Stein / Poisson approximation for the number of colliding pairs.
Formal Solution:
Part 1 -- Poisson approximation for $P(\text{no collisions})$:
For each pair $(i,j)$ with $i < j$, let $A_{ij}$ be the indicator that balls $i$ and $j$ land in the same bin. Then: $P(A_{ij} = 1) = \frac{1}{m}.$
The total number of colliding pairs is $W = \sum_{i<j} A_{ij}$. The expected number of collisions is: $\lambda = E[W] = \binom{n}{2} \cdot \frac{1}{m} = \frac{n(n-1)}{2m}.$
For large $m$ and $n = o(m^{2/3})$, the indicators $A_{ij}$ are nearly independent and $W$ is approximately $\text{Poisson}(\lambda)$. A collision occurs iff $W \ge 1$, so: $P(\text{no collisions}) \approx P(W = 0) = e^{-\lambda} = \exp\!\left(-\frac{n(n-1)}{2m}\right).$
Alternatively, by exact counting: $P(\text{no collisions}) = \frac{m(m-1)\cdots(m-n+1)}{m^n} = \prod_{k=0}^{n-1}\left(1 - \frac{k}{m}\right) \approx \exp\!\left(-\sum_{k=0}^{n-1}\frac{k}{m}\right) = \exp\!\left(-\frac{n(n-1)}{2m}\right),$ confirming the Poisson approximation.
Part 2 -- Solving for $n_{0.5}$:
We want $P(\text{at least one collision}) = 1/2$, i.e., $P(\text{no collisions}) = 1/2$: $\exp\!\left(-\frac{n(n-1)}{2m}\right) = \frac{1}{2}.$
Taking logs: $\frac{n(n-1)}{2m} = \ln 2.$
Solving the quadratic $n^2 - n - 2m\ln 2 = 0$: $n_{0.5} = \frac{1 + \sqrt{1 + 8m\ln 2}}{2}.$
In practice, round up to the nearest integer.
Part 3 -- Large-$m$ asymptotics:
For large $m$, the $\sqrt{8m\ln 2}$ term dominates: $n_{0.5} \sim \frac{\sqrt{8m\ln 2}}{2} = \sqrt{2m\ln 2}.$
Numerically: $\sqrt{2\ln 2} \approx 1.1774$, so $n_{0.5} \approx 1.177\sqrt{m}$.
For $m = 365$: $n_{0.5} \approx 1.177 \times 19.1 \approx 22.5$, giving $n_{0.5} = 23$. The well-known birthday problem answer.
Answer: $P(\text{no collisions}) \approx e^{-n(n-1)/(2m)}$; the threshold is $n_{0.5} = \frac{1 + \sqrt{1 + 8m\ln 2}}{2} \approx \sqrt{2m\ln 2}$ for large $m$.
Intuition
The birthday problem feels surprising because our intuition about probability is calibrated to linear scales, not square-root scales. The threshold $n_{0.5} \sim \sqrt{2m \ln 2}$ grows like $\sqrt{m}$, not like $m$. With 365 days, you only need 23 people -- not 183 -- to have a 50% chance of a match. This $\sqrt{m}$ scaling is a ubiquitous fingerprint of collision-type problems: it shows up in hash table load factors, the birthday attack in cryptography (which is why 64-bit hashes are breakable in