Top-K Alphas Under Correlation Constraints

Optimization · Hard · Free problem

You have $M$ candidate alpha signals, each with a quality score $q_j > 0$. You also have a sparse conflict graph $G = (V, E)$ where an edge $(i, j)$ means $|\text{Corr}(\alpha_i, \alpha_j)| > \rho_{\max}$ -- i.e., the two alphas are too correlated to run simultaneously.

You want to select a subset $S$ of at most $k$ alphas that maximizes total score $\sum_{j \in S} q_j$, subject to the constraint that no two selected alphas share an edge in the conflict graph.

  1. Explain why the exact problem is NP-hard by relating it to a well-known graph problem.
  1. Propose a practical approximation algorithm suitable for large sparse graphs (say $M = 50{,}000$ alphas, $E = 200{,}000$ edges). Analyze its expected runtime in terms of $M$, $E$, and $k$.
  1. Give one small explicit graph example (with concrete scores) where the simple greedy-by-score algorithm fails to find the optimal solution.

Hints

  1. Think about what graph-theoretic problem you are really solving when you select non-conflicting vertices of maximum total weight.
  2. For the approximation algorithm, start with a sorted greedy pass, then consider local search moves -- swapping one selected alpha for one or two unselected ones.
  3. For the counterexample, consider a star graph where the center has the highest individual score but the leaves collectively dominate it.

Worked Solution

How to Think About It: This is a portfolio construction problem dressed up as graph theory. In practice, you have thousands of alpha signals and many of them are correlated -- you cannot run all of them because correlated alphas eat into each other's capacity and inflate risk. The conflict graph encodes the "too correlated" pairs. Selecting the best non-conflicting subset is exactly the maximum weight independent set (MWIS) problem, which is one of the hardest problems in combinatorial optimization. In an interview, you want to (a) name the reduction immediately, (b) show you know a practical heuristic that works on the kinds of sparse graphs you actually see in production, and (c) demonstrate with a small example that naive greedy fails.

---

Part (i): NP-Hardness

The problem is a special case of the Maximum Weight Independent Set (MWIS) problem. An independent set in a graph is a subset of vertices with no two adjacent. We want to find the independent set of size at most $k$ with maximum total weight (score).

More precisely, the reduction from MWIS to our problem is trivial -- it is the same problem. If someone gives you an arbitrary graph $G$ with vertex weights and asks for the heaviest independent set of size $\leq k$, that is exactly our alpha selection problem with the graph as the conflict graph and weights as scores.

Since MWIS is NP-hard (it is NP-hard even on general unweighted graphs -- the decision version "does $G$ have an independent set of size $\geq k$?" is one of Karp's 21 NP-complete problems), our problem is NP-hard too.

Key point for the interview: MWIS is not just NP-hard, it is also hard to approximate. Unless P = NP, there is no polynomial-time algorithm that guarantees finding an independent set within a constant factor of optimal on general graphs. This is why heuristics with local search are the standard approach in practice.

---

Part (ii): Practical Approximation Algorithm

For large sparse conflict graphs, a greedy algorithm with local search works well in practice:

Algorithm: Greedy + Local Improvement

  1. Initialize: Sort all $M$ alphas by score $q_j$ in decreasing order. This costs $O(M \log M)$.
  1. Greedy construction: Iterate through the sorted list. For each alpha $j$, add it to $S$ if (a) $|S| < k$ and (b) $j$ has no neighbor already in $S$. To check condition (b) efficiently, maintain a hash set of selected vertices and, for each candidate, scan its adjacency list. This phase runs in $O(M + E)$ since each edge is checked at most twice.

3. Local search refinement: Repeat the following until no improvement is found: - 1-swap: For each alpha $j \in S$, check if removing $j$ and adding some non-selected alpha $j'$ (that is not adjacent to any remaining member of $S$) increases the total score. If so, perform the swap. - 2-swap (optional): Try removing one alpha from $S$ and adding two non-conflicting non-selected alphas, or removing two and adding three, etc. In practice, 1-swaps and (1,2)-swaps capture most of the improvement.

Each swap iteration scans $O(k \cdot \bar{d})$ neighbors where $\bar{d} = 2E/M$ is the average degree. In a sparse graph with bounded degree $\Delta$, each iteration is $O(k \cdot \Delta + M)$. Typically the number of improving iterations $T$ is small (often $O(k)$ or less).

  1. Output $S$.

Runtime analysis: - Sorting: $O(M \log M)$ - Greedy phase: $O(M + E)$ - Local search: $O(T \cdot (k \cdot \Delta + M))$ where $T$ is the number of improving iterations - Total: $O(M \log M + E + T \cdot k \cdot \Delta)$ for sparse graphs

For the given parameters ($M = 50{,}000$, $E = 200{,}000$, average degree $\bar{d} = 8$), the greedy phase takes milliseconds and a few rounds of local search finish in well under a second.

Alternative approaches worth mentioning: - LP relaxation: Solve the LP relaxation of the integer program $\max \sum q_j x_j$ subject to $x_i + x_j \leq 1$ for all edges and $\sum x_j \leq k$. Round fractional solutions. This gives better guarantees on some graph classes but is slower. - Graph coloring heuristic: Color the conflict graph, then pick the best alpha from each color class. This is fast but wastes capacity if $k$ is much smaller than the chromatic number.

---

Part (iii): Counterexample for Greedy-by-Score

Consider this small graph with 4 vertices and scores:

  • $A$: score
    0$
  • $B$: score $7$
  • $C$: score $6$
  • $D$: score $6$

Edges (conflicts): $(A, B)$, $(A, C)$, $(A, D)$. So $A$ conflicts with everyone else, but $B$, $C$, $D$ have no conflicts among themselves. Set $k = 3$.

Greedy-by-score: Picks $A$ first (highest score $= 10$). Then $B$, $C$, $D$ are all blocked. Total score $= 10$.

Optimal: Skip $A$, pick $\{B, C, D\}$. Total score $= 7 + 6 + 6 = 19$.

The greedy solution is less than 53% of optimal. The star graph is the classic counterexample: a high-scoring hub blocks many lower-scoring but collectively more valuable leaves.

---

Answer: (i) The problem is exactly Maximum Weight Independent Set, which is NP-hard (and hard to approximate). (ii) Greedy construction by decreasing score followed by local search (1-swaps and (1,2)-swaps) runs in $O(M \log M + E + T \cdot k \cdot \Delta)$ and works well on sparse graphs encountered in practice. (iii) A star graph where the hub has score 10 and three leaves have scores 7, 6, 6 shows greedy picks the hub (score 10) while optimal picks all three leaves (score 19).

Intuition

This problem captures one of the most important practical challenges in systematic trading: you can generate thousands of alpha signals, but you cannot run them all because many are correlated and would produce overlapping positions. The correlation constraint graph turns portfolio construction into a combinatorial optimization problem -- specifically, maximum weight independent set. The NP-hardness is not just a theoretical curiosity; it means there is no silver bullet, and every production alpha selection system uses some form of heuristic.

The deeper lesson is about the gap between greedy and optimal. Greedy-by-score is the instinct most people have ("just pick the best alpha, then the next best non-conflicting one, ..."), and it works surprisingly well on many real graphs because alpha score distributions are heavy-tailed and conflict graphs are sparse. But it can fail badly when a high-scoring alpha is a "hub" that blocks many decent alternatives. In practice, teams combine greedy construction with local search, LP relaxation bounds, or even sampling-based methods to close the gap. Understanding when and why greedy fails -- and having a concrete counterexample ready -- is exactly what interviewers are testing.

Open the full interactive solver →