Hypothesis Test for M&M Color Identification
A person claims they can determine the color of M&M candies by taste alone. In a controlled experiment, they correctly identify the color in 40 out of 100 trials. Suppose there are 5 equally common colors, so random guessing gives a success rate of $p_0 = 1/5 = 0.2$.
- Formulate a hypothesis test to evaluate whether the person performs significantly better than random guessing.
- Compute the test statistic and p-value using a normal approximation.
- State your conclusion at significance level $\alpha = 0.05$.
Hints
- Under random guessing with 5 colors, the success probability is $p_0 = 0.2$. Set up a one-sided test: $H_0: p = 0.2$ vs $H_a: p > 0.2$.
- Use the normal approximation to the binomial. Under $H_0$, the mean is $np_0 = 20$ and the standard deviation is $\sqrt{np_0(1-p_0)} = 4$. Compute the z-score.
- The z-score is $(40 - 20)/4 = 5.0$. Anything above $z = 1.645$ rejects at $\alpha = 0.05$. Five sigma is far beyond any reasonable threshold.
Worked Solution
How to Think About It: This is a textbook one-sided binomial test. The person got 40 out of 100 correct. Under random guessing, you would expect 20 correct. Is 40 far enough from 20 to be convincing? Your gut should immediately say yes -- 40 is way above 20. The question is just how to formalize this. The normal approximation to the binomial gives you a z-score, and if it is large enough, you reject the null.
Quick Estimate: Under the null, $X \sim \text{Binomial}(100, 0.2)$ has mean 20 and standard deviation $\sqrt{100 \times 0.2 \times 0.8} = \sqrt{16} = 4$. The observed value of 40 is $(40 - 20)/4 = 5$ standard deviations above the mean. Five sigma is astronomically significant -- the p-value is essentially zero. No need for tables; this is a slam dunk rejection.
Approach: Formal one-sided z-test for a proportion.
Formal Solution:
Hypotheses: - $H_0$: $p = 0.2$ (person is guessing randomly) - $H_a$: $p > 0.2$ (person can distinguish colors better than chance)
Test statistic:
Under $H_0$, $X \sim \text{Binomial}(100, 0.2)$. Using the normal approximation:
$\mu_0 = np_0 = 100 \times 0.2 = 20$
$\sigma_0 = \sqrt{np_0(1-p_0)} = \sqrt{100 \times 0.2 \times 0.8} = \sqrt{16} = 4$
$z = \frac{X - \mu_0}{\sigma_0} = \frac{40 - 20}{4} = 5.0$
P-value:
$\text{p-value} = P(Z \geq 5.0) \approx 2.87 \times 10^{-7}$
This is far below any conventional significance level.
Decision: Since $\text{p-value} \ll \alpha = 0.05$ (and indeed $\ll 0.001$), we reject $H_0$.
Answer: The test statistic is $z = 5.0$ with a p-value of approximately $3 \times 10^{-7}$. We strongly reject the null hypothesis at $\alpha = 0.05$. The data provides overwhelming evidence that the person can identify M&M colors at a rate significantly better than random guessing. Their observed success rate of 40% is 5 standard deviations above the 20% null expectation.
Intuition
This problem is a warm-up for the kind of statistical thinking that comes up constantly in quant work. You observe a signal (40% success rate) and need to decide whether it is real or noise. The framework is always the same: compute the expected value under the null, compute the standard deviation under the null, and see how many sigmas the observation is from the null. Five sigma is absurdly significant -- in practice, if you see something this extreme, your first instinct should actually be to question the experimental setup, not celebrate the result.
In trading, this same logic applies to evaluating strategy performance. If a backtest shows a Sharpe ratio of 5 over a year of daily data, you should not conclude that you have found alpha -- you should conclude that there is probably a bug in your backtest. Extreme significance is often a sign of data leakage, overfitting, or a flawed null hypothesis, not of genuine predictive power.