Statistics Interview Questions for Quant Roles: Hypothesis Testing, MLE & p-Value Traps

The estimator theory, testing framework, and p-value traps that separate quant researcher candidates — with a fully worked example.

Updated July 3, 2026 · QuantVault

Probability questions test whether you can compute. Statistics questions test whether you can be trusted with data — and for quant researcher roles at firms like Two Sigma, D.E. Shaw, and Citadel, that is the whole job. A researcher who misreads a p-value ships a strategy that is pure noise. So interviewers probe the exact places where practitioners get sloppy: what an estimator's bias actually means, what a confidence interval does and does not say, and why a backtest with 200 tested signals will always contain "significant" garbage.

Why interviews lean on statistics

Statistics rounds are a proxy for research judgment. Anyone can memorize that OLS minimizes squared error; fewer candidates can explain when the estimate is meaningless. Expect three layers of questioning: mechanical (compute a test statistic), conceptual (interpret it correctly), and adversarial (the interviewer feeds you a subtly wrong interpretation and watches whether you accept it). The third layer is where offers are decided. If your probability fundamentals are shaky, fix those first — statistics questions assume fluency with expectations, variances, and distributions of random variables.

The core toolkit

Nearly every statistics interview question draws on a short list of tools:

Estimator properties. Bias $\mathbb{E}[\hat\theta] - \theta$, variance, MSE $=$ bias$^2$ + variance, consistency, efficiency. Classic trap: the MLE of Gaussian variance divides by $n$, not $n-1$, so it is biased — but still consistent.
Maximum likelihood. Write the log-likelihood, differentiate, verify. Know the asymptotics: $\hat\theta_{MLE}$ is asymptotically normal with variance given by the inverse Fisher information.
The CLT and standard errors. Sample means are approximately normal with standard error $\sigma/\sqrt{n}$; this powers almost every test you will build at a whiteboard.
Hypothesis testing. Null vs. alternative, test statistic, Type I/II errors, power, and the precise definition of a p-value: the probability of data at least this extreme given the null is true.
Confidence intervals and their duality with two-sided tests.

Concept	Typical question	The trap inside it
p-value	"Your signal has p = 0.03. What does that mean?"	It is not P(null is true) = 3%
MLE	"Derive the MLE for a Bernoulli / exponential parameter"	Forgetting to check the second-order condition or boundary
Confidence interval	"Interpret a 95% CI for a strategy's mean return"	It is not "95% probability the parameter is inside" — the interval is random, not the parameter
Multiple testing	"You tested 100 signals; 5 are significant at 5%. Excited?"	That is exactly the false-positive count you'd expect from pure noise
Power	"How many samples to detect a Sharpe of 0.5?"	Candidates size for significance, not power

Worked example: is the coin biased?

You flip a coin 100 times and see 60 heads. Is the coin fair? This shows up constantly because it packs testing, estimation, and interpretation into one question.

Step 1 — set up the test. Under $H_0: p = 0.5$, the head count is $X \sim \text{Bin}(100, 0.5)$ with mean $np = 50$ and standard deviation $\sqrt{np(1-p)} = \sqrt{25} = 5$. The normal approximation gives

$$z = \frac{60 - 50}{5} = 2.0, \qquad p\text{-value} \approx 2\,\Phi(-2.0) \approx 0.046.$$

At the 5% level, you (barely) reject fairness. Strong candidates volunteer the continuity correction: using 59.5 instead of 60 gives $z = 1.9$ and a p-value around $0.057$ — now you fail to reject. Saying "the answer flips depending on the approximation, so the evidence is marginal" is worth more than either number alone.

Step 2 — estimate the bias. The Bernoulli log-likelihood is $\ell(p) = 60\ln p + 40\ln(1-p)$. Setting $\ell'(p) = 60/p - 40/(1-p) = 0$ yields $\hat p_{MLE} = 0.6$. Its standard error is $\sqrt{\hat p(1-\hat p)/n} = \sqrt{0.24/100} \approx 0.049$, so the 95% Wald interval is $0.6 \pm 1.96(0.049) \approx (0.504,\ 0.696)$ — just excluding 0.5, consistent with the marginal test.

Step 3 — survive the follow-up. "You actually tested 20 coins and this was the most extreme one. Now what?" With 20 independent fair coins, the chance at least one shows $p \le 0.046$ is $1 - 0.954^{20} \approx 61\%$. Your "discovery" is expected under the null. A Bonferroni-corrected threshold of $0.05/20 = 0.0025$ kills it. This follow-up is the quant-research punchline: the same logic is why most backtested signals are false positives. A Bayesian follow-up ("what prior on coin bias would change your conclusion?") is also common.

The traps that fail candidates

Restating the p-value as P(H₀ | data). The single most common fail. It is P(data at least this extreme | H₀).
Treating "not significant" as "no effect." Absence of evidence is not evidence of absence; mention power.
Ignoring multiple comparisons whenever the question involves more than one test, signal, or strategy.
Invoking the CLT on heavy tails or dependent data. Financial returns are both; say so before your interviewer does. This bridges into time series questions about autocorrelation.
Confusing correlation with independence. Zero correlation does not imply independence except in special cases like joint normality — a favorite in regression rounds.

Practice next

Reading about traps is not the same as dodging them under a 30-minute clock. Work through our statistics question bank with fully worked solutions, keep the key formulas at hand with the quant interview cheat sheet, and drill the full range of topics in the problem bank — roughly 400 problems are free.

Frequently asked questions

What statistics topics come up most in quant interviews?

Hypothesis testing and p-value interpretation, maximum likelihood estimation, estimator properties (bias, variance, consistency), confidence intervals, the central limit theorem, and multiple-testing corrections. Quant researcher interviews go deepest on interpretation: interviewers deliberately offer wrong readings of a p-value or confidence interval to see if you accept them.

How do I answer a p-value interview question correctly?

State the definition precisely: the p-value is the probability of observing data at least as extreme as yours, assuming the null hypothesis is true. It is not the probability that the null hypothesis is true, and it is not the probability your result is a fluke. Getting this backwards is the single most common way candidates fail statistics rounds.

What is the coin-flip hypothesis testing interview question?

You observe 60 heads in 100 flips and must decide if the coin is fair. Under the null, the count has mean 50 and standard deviation 5, giving z = 2.0 and a two-sided p-value of about 0.046 — marginal evidence against fairness. Strong answers note the continuity correction pushes the p-value above 0.05, and handle the multiple-testing follow-up where the coin was the most extreme of 20 tested.

Do quant trading interviews test statistics or just probability?

Trader interviews lean toward probability, expected value, and mental math, while quant researcher interviews test statistics in real depth. Researcher rounds at systematic funds regularly cover MLE derivations, test construction, power, and multiple-comparison reasoning, because those skills directly determine whether a candidate can tell a real trading signal from noise.

Practice the real thing

QuantVault has 2,800+ quant interview problems with full solutions, intuition, and hints, firm-by-firm interview funnels, and an auto-graded coding judge. Start free.

Browse problems Firm interview guides