Benjamini-Hochberg FDR Control

Statistics · Hard · Free problem

You are a quant researcher evaluating $m$ candidate trading signals. You run a backtest on each one and collect their p-values, which you sort in ascending order: $p_{(1)} \le p_{(2)} \le \cdots \le p_{(m)}$. You want to identify which signals are genuinely predictive while controlling the fraction of false discoveries.

  1. State the procedure. Given a target false discovery rate (FDR) level $q \in (0, 1)$, define the Benjamini-Hochberg (BH) procedure. In particular, define the threshold index $\hat{k} = \max\{k : p_{(k)} \le qk/m\}$ and explain which hypotheses are rejected.
  1. Prove FDR control. Show that under the assumption that the p-values corresponding to true null hypotheses are independent of each other and of the p-values of the false nulls, the BH procedure satisfies $\text{FDR} = E\!\left[\frac{V}{R \vee 1}\right] \le q$, where $V$ is the number of false rejections and $R$ is the total number of rejections.
  1. Positive dependence. Discuss what happens when the independence assumption fails. Under what dependence structure do the BH guarantees still hold, and what modification is needed for arbitrary dependence?

Hints

  1. Think about what happens when you decompose the false discovery proportion into a sum of indicator variables -- one for each true null hypothesis.
  2. For each true null $H_i$, its p-value is $\text{Uniform}(0,1)$. Condition on all the other p-values and compute $E[V_i / R]$ using the uniformity of $p_i$ and the fact that rejection at threshold $R = r$ requires $p_i \le qr/m$.
  3. After showing each true null contributes at most $q/m$ to the expected FDR, sum over the $m_0$ true nulls. The factor $m_0/m \le 1$ gives you the result. For positive dependence, look up the PRDS condition from Benjamini-Yekutieli (2001).

Worked Solution

How to Think About It: When you test hundreds of signals, some will look significant purely by chance. If you test 200 signals at the 5% level, you expect about 10 false positives even if every signal is pure noise. Bonferroni (reject if $p_i \le q/m$) controls the family-wise error rate but is far too conservative -- it kills your power to detect real signals. The BH procedure is the workhorse alternative: instead of controlling the chance of *any* false positive, it controls the *fraction* of your discoveries that are false. In quant research, this is exactly what you want -- you do not mind a few duds in your portfolio of signals as long as the hit rate stays above some threshold.

The key intuition for why BH works is a beautiful counting argument. Each true null p-value is uniform on $[0,1]$, so it crosses the BH line $qk/m$ at a rate that, on average, contributes exactly $q/m$ to the expected number of false discoveries at each step. Summing over the $m_0$ true nulls and dividing by $R$ gives you $m_0 q / m \le q$.

Formal Derivation:

Part 1 -- The BH Procedure:

  1. Sort the $m$ p-values in ascending order: $p_{(1)} \le p_{(2)} \le \cdots \le p_{(m)}$.
  2. For target FDR level $q \in (0,1)$, find the largest index $k$ such that $p_{(k)} \le qk/m$.
  3. Define the threshold $\hat{k} = \max\{k : p_{(k)} \le qk/m\}$. If no such $k$ exists, set $\hat{k} = 0$.
  4. Reject all hypotheses whose p-values satisfy $p_{(i)} \le p_{(\hat{k})}$, i.e., reject $H_{(1)}, H_{(2)}, \ldots, H_{(\hat{k})}$.

Geometrically, you are plotting the ordered p-values against the line $y = qk/m$ and finding the rightmost crossing from below.

Part 2 -- Proof of FDR Control Under Independence:

Let $m_0$ be the (unknown) number of true nulls among the $m$ hypotheses. Let $V$ denote the number of falsely rejected true nulls and $R = \hat{k}$ the total number of rejections. We want to show $E[V / (R \vee 1)] \le q$.

Step 1: Indicator decomposition. For each true null hypothesis $H_i$ (with $i \in \mathcal{H}_0$, the set of true nulls), define $V_i = \mathbf{1}\{H_i \text{ is rejected}\}$. Then $V = \sum_{i \in \mathcal{H}_0} V_i$ and

$\frac{V}{R \vee 1} = \sum_{i \in \mathcal{H}_0} \frac{V_i}{R \vee 1}.$

By linearity of expectation:

$E\!\left[\frac{V}{R \vee 1}\right] = \sum_{i \in \mathcal{H}_0} E\!\left[\frac{V_i}{R \vee 1}\right].$

Step 2: Condition on all other p-values. Fix a true null $H_i$ with p-value $p_i$. Under the null, $p_i \sim \text{Uniform}(0,1)$, independent of the other p-values (by assumption). Condition on the p-values $\{p_j : j \neq i\}$. Given these, the BH threshold $\hat{k}$ and the rejection decision for $H_i$ depend on $p_i$ only through whether $p_i$ falls below a certain data-dependent cutoff.

Step 3: Key calculation. When $H_i$ is rejected by the BH procedure with $R$ total rejections, then $H_i$ is one of the $R$ rejected hypotheses, so $p_i \le qR/m$. Therefore:

$\frac{V_i}{R \vee 1} = \frac{\mathbf{1}\{H_i \text{ rejected}\}}{R} \le \frac{\mathbf{1}\{p_i \le qR/m\}}{R}.$

Now consider the possible values of $R$. If $H_i$ is rejected and the threshold is $R = r$, then $p_i \le qr/m$. We can write:

$\frac{V_i}{R \vee 1} \le \sum_{r=1}^{m} \frac{\mathbf{1}\{p_i \le qr/m\}}{r} \cdot \mathbf{1}\{R = r\}.$

But a cleaner route uses the following observation. For any realization, if $H_i$ is rejected at threshold $R = r$, then $p_i$ falls in $[0, qr/m]$. Taking the expectation over $p_i$ (which is uniform, independent of the other p-values that determine $r$):

$E\!\left[\frac{V_i}{R \vee 1} \;\Big|\; \{p_j\}_{j \neq i}\right] \le \sum_{r=1}^{m} \frac{qr/m}{r} \cdot \mathbf{1}\{R_{-i} \text{ consistent with } R = r\} = \frac{q}{m},$

where the crucial step uses $P(p_i \le qr/m) = qr/m$ (uniformity of the true null p-value) and the $r$ cancels with the denominator. The technical details require careful handling of how adding $p_i$ back changes $R$, but the independence assumption ensures the conditioning is valid.

Step 4: Sum over true nulls.

$E\!\left[\frac{V}{R \vee 1}\right] = \sum_{i \in \mathcal{H}_0} E\!\left[\frac{V_i}{R \vee 1}\right] \le \sum_{i \in \mathcal{H}_0} \frac{q}{m} = \frac{m_0}{m} \cdot q \le q.$

This completes the proof. Note the factor $m_0/m \le 1$ means BH is actually conservative -- the true FDR is at most $m_0 q/m$, which is strictly less than $q$ whenever some alternatives are present.

Part 3 -- Positive Dependence:

The independence assumption can be relaxed. Benjamini and Yekutieli (2001) showed that the BH procedure still controls FDR at level $m_0 q / m \le q$ under a condition called positive regression dependence on a subset (PRDS). Formally, PRDS requires that for each true null $H_i$, the conditional probability $P(\text{reject any fixed set of hypotheses} \mid p_i = t)$ is non-decreasing in $t$. Intuitively, this means that a large p-value for one true null makes it *more* likely (not less) that other hypotheses also have large p-values.

PRDS holds in many practical settings, including: - One-sided test statistics from a multivariate normal with non-negative correlations - Positively correlated t-statistics (common in factor models where signals share common exposures)

For arbitrary dependence (including negative correlations), the BH guarantee can break. The standard fix is to replace the threshold with $\hat{k} = \max\{k : p_{(k)} \le qk / (m \cdot c_m)\}$ where $c_m = \sum_{j=1}^{m} 1/j \approx \ln m + \gamma$. This is the Benjamini-Yekutieli (BY) procedure. The harmonic correction makes it much more conservative -- for $m = 200$, you lose a factor of about $\ln 200 \approx 5.3$ in power -- so it is a last resort when you genuinely cannot assume any dependence structure.

Interpretation: In quant research, signals built from overlapping data, correlated factor exposures, or shared instruments typically exhibit positive dependence. The standard BH procedure is usually safe. But if you are testing long-short signals where some are mechanically negatively correlated (e.g., momentum vs. reversal on the same universe), the PRDS assumption may fail and the BY correction is warranted.

Answer: The BH procedure rejects all $H_{(1)}, \ldots, H_{(\hat{k})}$ where $\hat{k} = \max\{k : p_{(k)} \le qk/m\}$. Under independence (or PRDS), $\text{FDR} \le m_0 q / m \le q$. The proof decomposes $V/(R \vee 1)$ into per-hypothesis contributions, uses the uniformity of true null p-values and independence to show each contributes at most $q/m$ in expectation, then sums over the $m_0$ true nulls. Under arbitrary dependence, the BH threshold must be divided by $c_m = \sum_{j=1}^{m} 1/j$ to maintain control.

Intuition

The BH procedure is the single most important multiple testing tool in quantitative research. The core idea is elegant: instead of asking "did I make any mistake?" (which Bonferroni controls, at enormous cost to power), you ask "what fraction of my discoveries are mistakes?" This is the right question when you are building a portfolio of signals -- you can tolerate some false positives as long as the overall hit rate is acceptable. The proof works because each true null p-value is uniformly distributed, so it has a simple, predictable probability of sneaking past any given threshold. Independence lets you analyze each true null separately, and the $r$ in the numerator $qr/m$ cancels with the $r$ in the denominator

/R$, giving each true null a contribution of exactly $q/m$ regardless of how many total rejections there are.

The practical lesson for quant researchers is about dependence. Most signal-testing setups have positive correlations (signals share data, factors are correlated), and PRDS covers this case. But if you are doing something adversarial -- like testing a momentum signal alongside its negation -- the standard BH procedure can understate FDR. The BY correction with the harmonic penalty $\sum 1/j$ is the safe fallback, though it costs you substantially in power. In practice, most quant teams use BH with a qualitative assessment of the dependence structure rather than paying the BY penalty.

Open the full interactive solver →