Expected Number of Birthday Quadruples
There are $n = 25$ people in a probability class. Each person's birthday is uniformly distributed over 365 days, independently of everyone else.
Compute the expected number of 4-person groups (subsets of size $k = 4$) in which all four people share the same birthday. Round your answer to the nearest hundred-thousandth.
Hints
- Define an indicator $I_S$ for each subset $S$ of 4 people, equal to 1 if all four share a birthday. The expected count is just a sum of these indicators.
- By linearity of expectation, $E[T] = \binom{n}{k} \cdot P(\text{4 specific people all match})$. Focus on computing that single probability.
- Fix the birthday of one person in the group. Each of the remaining $k - 1$ people independently matches with probability /365$, giving $P = 1/365^{k-1}$.
Worked Solution
How to Think About It: The classic birthday problem asks whether any two people share a birthday. This is a generalization: instead of pairs, we are counting 4-person subsets that are all born on the same day. The trick is indicator variables -- one per subset -- because trying to track the full distribution of matching groups is a nightmare, while computing the probability that one particular group of 4 all shares a birthday is trivial. Linearity of expectation does the rest. You can see immediately that the answer is going to be tiny: the probability that 4 specific people all share a birthday is about
/365^3 \approx 2 \times 10^{-8}$, and there are $\binom{25}{4} = 12650$ such subsets, so the expectation is somewhere in the0^{-4}$ range.Quick Estimate: Each group of 4 matches with probability
/365^3$. There are $\binom{25}{4} = 12650$ groups. So the expected count is roughly $12650 \times \frac{1}{365^3} \approx \frac{12650}{4.86 \times 10^7} \approx 2.6 \times 10^{-4}.$ That is the ballpark before doing any formal derivation.Approach: Define one indicator per subset, use linearity of expectation, and compute the probability that all members of a given subset match.
Formal Solution:
For each subset $S$ of 4 people from the class, define the indicator $I_S = \mathbf{1}[\text{all 4 people in } S \text{ share the same birthday}].$
The total number of 4-person birthday matches is $T = \sum_{|S|=4} I_S,$ where the sum runs over all $\binom{25}{4}$ subsets of size 4.
By linearity of expectation: $E[T] = \binom{25}{4} \cdot E[I_S].$
Now compute $E[I_S] = P(\text{4 specific people all share a birthday})$. Condition on person 1's birthday -- it can be anything. For each of persons 2, 3, and 4, the probability of matching person 1's birthday is
/365$. Since birthdays are independent: $E[I_S] = \left(\frac{1}{365}\right)^3 = \frac{1}{365^3}.$The general formula for arbitrary $n$ and $k$ is: $E[T] = \frac{1}{365^{k-1}} \binom{n}{k}.$
Plugging in $n = 25$, $k = 4$: $E[T] = \frac{1}{365^3} \binom{25}{4} = \frac{12650}{48{,}627{,}125} \approx 0.00026.$
Answer: $E[T] = \frac{\binom{25}{4}}{365^3} = \frac{12650}{48{,}627{,}125} \approx 0.00026.$
Intuition
The indicator variable technique is the workhorse of birthday-type problems. The key insight is that even though different subsets are not independent (they share members), you never need independence to apply linearity of expectation. You only need to know the expected value of each indicator, which is just a single-subset probability -- a much simpler calculation.
This structure appears constantly in quant work: counting rare coincidences (hash collisions, trade matching errors, default clustering), estimating expected numbers of anomalies in large datasets. The answer scales as $n^k / (365^{k-1} \cdot k!)$, so adding one more person or raising $k$ by one dramatically changes the expectation. In practice, this is why birthday attacks in cryptography are dangerous for small hash spaces, and why rare-event clustering in portfolios can be surprisingly common once you account for all the ways it can happen.