Sample Size for Detecting a Biased Coin

Statistics · Medium · Free problem

A coin has an unknown probability of heads $p$. You suspect it might be slightly biased toward tails, with $p = 0.49$, but it could also be fair ($p = 0.5$). You want to design a hypothesis test to distinguish between these two cases.

How many flips do you need to be reasonably confident you can tell the difference? Specifically:

  1. Set up the hypothesis test $H_0: p = 0.5$ vs. $H_1: p = 0.49$. Using a one-sided test at significance level $\alpha = 0.05$ and power
    - \beta = 0.80$, find the required sample size $n$.
  1. How does the required sample size change if you want power $0.95$ instead?
  1. Explain intuitively why the number is so large, and how it relates to the effect size $\delta = |0.50 - 0.49| = 0.01$.

Hints

  1. Think about how the standard error of $\hat{p}$ shrinks with $n$. How small does it need to be to detect a 0.01 shift?
  2. Use the power analysis formula: $n = \left\lceil \left(\frac{(z_\alpha + z_\beta) \sigma}{\delta}\right)^2 \right\rceil$ with $\sigma = \sqrt{p(1-p)} \approx 0.5$ and $\delta = 0.01$. Because $n$ is a required minimum, round UP.
  3. For 80% power: $z_\alpha = 1.645$, $z_\beta = 0.842$, giving $n = \lceil (2.487/0.02)^2 \rceil = \lceil 15462.92 \rceil = 15463$. For 95% power, replace $z_\beta$ with
    .645$, giving $\lceil 27060.25 \rceil = 27061$.

Worked Solution

How to Think About It: You are trying to detect a 1% shift in a coin's bias. That is a tiny signal buried in a lot of noise -- each flip has standard deviation $\sigma \approx 0.5$, so the effect size relative to the noise is $\delta/\sigma = 0.01/0.5 = 0.02$. Before touching any formulas, your gut should say: this is going to take a LOT of flips. The standard error of $\hat{p}$ falls like

/\sqrt{n}$, so you need $n$ large enough that
/\sqrt{n}$ becomes comparable to 0.01. That means $\sqrt{n} \sim 100$, or $n \sim 10{,}000$. We are in the tens-of-thousands range.

Quick Estimate: The rule of thumb for sample size is $n \approx (z_\alpha + z_\beta)^2 \sigma^2 / \delta^2$. For 80% power: $z_{0.05} = 1.645$, $z_{0.20} = 0.842$. So $n \approx (1.645 + 0.842)^2 \times 0.25 / 0.0001 = (2.487)^2 \times 2500 \approx 6.19 \times 2500 \approx 15{,}460$. For 95% power: $z_{0.05} = 1.645$, so $n \approx (1.645 + 1.645)^2 \times 2500 = (3.29)^2 \times 2500 \approx 10.82 \times 2500 \approx 27{,}060$. These are big numbers, but they match our gut estimate. Note that since $n$ is a *required minimum* sample size, we will round any fractional result UP to the next integer.

Approach: Use the standard normal approximation to the binomial and the classical power analysis formula for a one-sided Z-test.

Formal Solution:

We test $H_0: p = 0.5$ vs. $H_1: p = 0.49$. Under $H_0$, the sample proportion $\hat{p}$ over $n$ flips has:

$\hat{p} \sim N\left(0.5, \frac{0.25}{n}\right)$

The test statistic is:

$Z = \frac{\hat{p} - 0.5}{\sqrt{0.25/n}} = \frac{\hat{p} - 0.5}{0.5/\sqrt{n}}$

We reject $H_0$ (one-sided, left tail) if $Z < -z_\alpha = -1.645$.

For the test to have power

- \beta$ at $p = 0.49$, we need:

$P\left(Z < -z_\alpha \mid p = 0.49\right) = 1 - \beta$

Under $H_1$, $\hat{p}$ has mean $0.49$ and (approximately) the same variance $0.25/n$. So the shifted Z-statistic satisfies:

$\frac{-z_\alpha - (0.49 - 0.5)/(0.5/\sqrt{n})}{1} = z_\beta$

Rearranging:

$z_\alpha + z_\beta = \frac{0.01 \sqrt{n}}{0.5}$

$\sqrt{n} = \frac{0.5(z_\alpha + z_\beta)}{0.01}$

$n = \left\lceil \left(\frac{z_\alpha + z_\beta}{0.02}\right)^2 \right\rceil$

The ceiling appears because $n$ is the *minimum* number of flips needed to reach the target power; a fractional answer must be rounded UP, since rounding down would leave the test slightly underpowered.

Part 1 (80% power): $z_\alpha = 1.645$, $z_\beta = 0.842$.

$n = \left\lceil \left(\frac{2.487}{0.02}\right)^2 \right\rceil = \lceil (124.35)^2 \rceil = \lceil 15462.92 \rceil = 15{,}463$

You need at least 15,463 flips.

Part 2 (95% power): $z_\beta = 1.645$.

$n = \left\lceil \left(\frac{3.290}{0.02}\right)^2 \right\rceil = \lceil (164.5)^2 \rceil = \lceil 27060.25 \rceil = 27{,}061$

You need at least 27,061 flips.

Part 3 (Intuition): The required sample size scales as

/\delta^2$. Halving the effect size quadruples the sample size. When $\delta = 0.01$ and $\sigma = 0.5$, the signal-to-noise ratio per flip is only 0.02 -- you need to average over thousands of flips to accumulate enough evidence. This is the fundamental bottleneck: the sampling noise of a fair coin ($\sigma = 0.5$) is 50 times larger than the effect you are trying to detect.

Answer: For a one-sided test at $\alpha = 0.05$, rounding the required minimum sample size UP to the next integer: $n = 15{,}463$ flips for 80% power, and $n = 27{,}061$ for 95% power. The general formula is $n = \left\lceil \left(\frac{(z_\alpha + z_\beta)\sigma}{\delta}\right)^2 \right\rceil$ where $\delta = 0.01$ and $\sigma = 0.5$.

Intuition

This problem illustrates a universal law of statistical detection: the required sample size scales as

/\delta^2$, where $\delta$ is the effect size. A 1% bias in a coin sounds small, but it is not just small -- it is small relative to the enormous per-flip noise ($\sigma = 0.5$). The signal-to-noise ratio per observation is 0.02, so you need to average over roughly $(1/0.02)^2 = 2{,}500$ flips just to get a SNR of 1, and several times more to achieve adequate power.

This same scaling governs every detection problem in quant finance. Trying to tell if a strategy has a Sharpe ratio of 0.5 vs. 0.0? Same math, same

/\delta^2$ scaling, same depressingly large sample requirements. It is why backtests need years of data, why A/B tests on click rates need millions of impressions, and why a trader who claims to detect a 1bp edge after 100 trades should be treated with deep skepticism. The practical lesson: before you run any experiment, do the power calculation first. If the answer is "you need 10 years of data," redesign the experiment or find a bigger effect to target.

Open the full interactive solver →