Benjamini-Hochberg FDR Control
You are a quant researcher evaluating $m$ candidate trading signals. You run a backtest on each one and collect their p-values, which you sort in ascending order: $p_{(1)} \le p_{(2)} \le \cdots \le p_{(m)}$. You want to identify which signals are genuinely predictive while controlling the fraction of false discoveries.
- State the procedure. Given a target false discovery rate (FDR) level $q \in (0, 1)$, define the Benjamini-Hochberg (BH) procedure. In particular, define the threshold index $\hat{k} = \max\{k : p_{(k)} \le qk/m\}$ and explain which hypotheses are rejected.
- Prove FDR control. Show that under the assumption that the p-values corresponding to true null hypotheses are independent of each other and of the p-values of the false nulls, the BH procedure satisfies $\text{FDR} = E\!\left[\frac{V}{R \vee 1}\right] \le q$, where $V$ is the number of false rejections and $R$ is the total number of rejections.
- Positive dependence. Discuss what happens when the independence assumption fails. Under what dependence structure do the BH guarantees still hold, and what modification is needed for arbitrary dependence?
Hints
- Think about what happens when you decompose the false discovery proportion into a sum of indicator variables -- one for each true null hypothesis.
- For each true null $H_i$, its p-value is $\text{Uniform}(0,1)$. Condition on all the other p-values and compute $E[V_i / R]$ using the uniformity of $p_i$ and the fact that rejection at threshold $R = r$ requires $p_i \le qr/m$.
- After showing each true null contributes at most $q/m$ to the expected FDR, sum over the $m_0$ true nulls. The factor $m_0/m \le 1$ gives you the result. For positive dependence, look up the PRDS condition from Benjamini-Yekutieli (2001).
Worked Solution
How to Think About It: When you test hundreds of signals, some will look significant purely by chance. If you test 200 signals at the 5% level, you expect about 10 false positives even if every signal is pure noise. Bonferroni (reject if $p_i \le q/m$) controls the family-wise error rate but is far too conservative -- it kills your power to detect real signals. The BH procedure is the workhorse alternative: instead of controlling the chance of *any* false positive, it controls the *fraction* of your discoveries that are false. In quant research, this is exactly what you want -- you do not mind a few duds in your portfolio of signals as long as the hit rate stays above some threshold.
The key intuition for why BH works is a beautiful counting argument. Each true null p-value is uniform on $[0,1]$, so it crosses the BH line $qk/m$ at a rate that, on average, contributes exactly $q/m$ to the expected number of false discoveries at each step. Summing over the $m_0$ true nulls and dividing by $R$ gives you $m_0 q / m \le q$.
Formal Derivation:
Part 1 -- The BH Procedure:
- Sort the $m$ p-values in ascending order: $p_{(1)} \le p_{(2)} \le \cdots \le p_{(m)}$.
- For target FDR level $q \in (0,1)$, find the largest index $k$ such that $p_{(k)} \le qk/m$.
- Define the threshold $\hat{k} = \max\{k : p_{(k)} \le qk/m\}$. If no such $k$ exists, set $\hat{k} = 0$.
- Reject all hypotheses whose p-values satisfy $p_{(i)} \le p_{(\hat{k})}$, i.e., reject $H_{(1)}, H_{(2)}, \ldots, H_{(\hat{k})}$.
Geometrically, you are plotting the ordered p-values against the line $y = qk/m$ and finding the rightmost crossing from below.
Part 2 -- Proof of FDR Control Under Independence:
Let $m_0$ be the (unknown) number of true nulls among the $m$ hypotheses. Let $V$ denote the number of falsely rejected true nulls and $R = \hat{k}$ the total number of rejections. We want to show $E[V / (R \vee 1)] \le q$.
Step 1: Indicator decomposition. For each true null hypothesis $H_i$ (with $i \in \mathcal{H}_0$, the set of true nulls), define $V_i = \mathbf{1}\{H_i \text{ is rejected}\}$. Then $V = \sum_{i \in \mathcal{H}_0} V_i$ and
$\frac{V}{R \vee 1} = \sum_{i \in \mathcal{H}_0} \frac{V_i}{R \vee 1}.$
By linearity of expectation:
$E\!\left[\frac{V}{R \vee 1}\right] = \sum_{i \in \mathcal{H}_0} E\!\left[\frac{V_i}{R \vee 1}\right].$
Step 2: Condition on all other p-values. Fix a true null $H_i$ with p-value $p_i$. Under the null, $p_i \sim \text{Uniform}(0,1)$, independent of the other p-values (by assumption). Condition on the p-values $\{p_j : j \neq i\}$. Given these, the BH threshold $\hat{k}$ and the rejection decision for $H_i$ depend on $p_i$ only through whether $p_i$ falls below a certain data-dependent cutoff.
Step 3: Key calculation. When $H_i$ is rejected by the BH procedure with $R$ total rejections, then $H_i$ is one of the $R$ rejected hypotheses, so $p_i \le qR/m$. Therefore:
$\frac{V_i}{R \vee 1} = \frac{\mathbf{1}\{H_i \text{ rejected}\}}{R} \le \frac{\mathbf{1}\{p_i \le qR/m\}}{R}.$
Now consider the possible values of $R$. If $H_i$ is rejected and the threshold is $R = r$, then $p_i \le qr/m$. We can write:
$\frac{V_i}{R \vee 1} \le \sum_{r=1}^{m} \frac{\mathbf{1}\{p_i \le qr/m\}}{r} \cdot \mathbf{1}\{R = r\}.$
But a cleaner route uses the following observation. For any realization, if $H_i$ is rejected at threshold $R = r$, then $p_i$ falls in $[0, qr/m]$. Taking the expectation over $p_i$ (which is uniform, independent of the other p-values that determine $r$):
$E\!\left[\frac{V_i}{R \vee 1} \;\Big|\; \{p_j\}_{j \neq i}\right] \le \sum_{r=1}^{m} \frac{qr/m}{r} \cdot \mathbf{1}\{R_{-i} \text{ consistent with } R = r\} = \frac{q}{m},$
where the crucial step uses $P(p_i \le qr/m) = qr/m$ (uniformity of the true null p-value) and the $r$ cancels with the denominator. The technical details require careful handling of how adding $p_i$ back changes $R$, but the independence assumption ensures the conditioning is valid.
Step 4: Sum over true nulls.
$E\!\left[\frac{V}{R \vee 1}\right] = \sum_{i \in \mathcal{H}_0} E\!\left[\frac{V_i}{R \vee 1}\right] \le \sum_{i \in \mathcal{H}_0} \frac{q}{m} = \frac{m_0}{m} \cdot q \le q.$
This completes the proof. Note the factor $m_0/m \le 1$ means BH is actually conservative -- the true FDR is at most $m_0 q/m$, which is strictly less than $q$ whenever some alternatives are present.
Part 3 -- Positive Dependence:
The independence assumption can be relaxed. Benjamini and Yekutieli (2001) showed that the BH procedure still controls FDR at level $m_0 q / m \le q$ under a condition called positive regression dependence on a subset (PRDS). Formally, PRDS requires that for each true null $H_i$, the conditional probability $P(\text{reject any fixed set of hypotheses} \mid p_i = t)$ is non-decreasing in $t$. Intuitively, this means that a large p-value for one true null makes it *more* likely (not less) that other hypotheses also have large p-values.
PRDS holds in many practical settings, including: - One-sided test statistics from a multivariate normal with non-negative correlations - Positively correlated t-statistics (common in factor models where signals share common exposures)
For arbitrary dependence (including negative correlations), the BH guarantee can break. The standard fix is to replace the threshold with $\hat{k} = \max\{k : p_{(k)} \le qk / (m \cdot c_m)\}$ where $c_m = \sum_{j=1}^{m} 1/j \approx \ln m + \gamma$. This is the Benjamini-Yekutieli (BY) procedure. The harmonic correction makes it much more conservative -- for $m = 200$, you lose a factor of about $\ln 200 \approx 5.3$ in power -- so it is a last resort when you genuinely cannot assume any dependence structure.
Interpretation: In quant research, signals built from overlapping data, correlated factor exposures, or shared instruments typically exhibit positive dependence. The standard BH procedure is usually safe. But if you are testing long-short signals where some are mechanically negatively correlated (e.g., momentum vs. reversal on the same universe), the PRDS assumption may fail and the BY correction is warranted.
Answer: The BH procedure rejects all $H_{(1)}, \ldots, H_{(\hat{k})}$ where $\hat{k} = \max\{k : p_{(k)} \le qk/m\}$. Under independence (or PRDS), $\text{FDR} \le m_0 q / m \le q$. The proof decomposes $V/(R \vee 1)$ into per-hypothesis contributions, uses the uniformity of true null p-values and independence to show each contributes at most $q/m$ in expectation, then sums over the $m_0$ true nulls. Under arbitrary dependence, the BH threshold must be divided by $c_m = \sum_{j=1}^{m} 1/j$ to maintain control.
Intuition
The BH procedure is the single most important multiple testing tool in quantitative research. The core idea is elegant: instead of asking "did I make any mistake?" (which Bonferroni controls, at enormous cost to power), you ask "what fraction of my discoveries are mistakes?" This is the right question when you are building a portfolio of signals -- you can tolerate some false positives as long as the overall hit rate is acceptable. The proof works because each true null p-value is uniformly distributed, so it has a simple, predictable probability of sneaking past any given threshold. Independence lets you analyze each true null separately, and the $r$ in the numerator $qr/m$ cancels with the $r$ in the denominator
The practical lesson for quant researchers is about dependence. Most signal-testing setups have positive correlations (signals share data, factors are correlated), and PRDS covers this case. But if you are doing something adversarial -- like testing a momentum signal alongside its negation -- the standard BH procedure can understate FDR. The BY correction with the harmonic penalty $\sum 1/j$ is the safe fallback, though it costs you substantially in power. In practice, most quant teams use BH with a qualitative assessment of the dependence structure rather than paying the BY penalty.