Most candidates over-prepare probability and under-prepare systems. If you are interviewing for a quant developer or core-infrastructure role at firms like Hudson River Trading, Jump Trading, Citadel Securities, or XTX Markets, the C++ round is not LeetCode with fancier syntax. It tests whether you understand what the hardware does with your code — and whether you can reason quantitatively about nanoseconds.
Why HFT firms ask this
An HFT firm's edge often lives inside a microsecond. Interviewers use low-latency questions to separate people who have read a blog post about "cache-friendly code" from people who can estimate the cost of a design decision before writing it. Three question families dominate:
- Hardware mechanics: cache hierarchy, TLBs, branch prediction, NUMA, false sharing.
- Concurrency: the C++ memory model,
std::atomic, acquire/release semantics, lock-free single-producer/single-consumer (SPSC) queues. - Language mechanics under constraints: avoiding allocation on the hot path, virtual-call cost, templates vs. runtime polymorphism, placement
new, cache-line alignment.
A fourth theme — order book design — blends systems with market microstructure: "Design a limit order book supporting add/cancel/execute in $O(1)$" is close to a rite of passage.
The numbers you must know cold
Interviewers expect Jeff Dean–style latency numbers without hesitation. Approximate, order-of-magnitude values are fine; not knowing them at all is disqualifying.
| Operation | Approx. latency | In 4 GHz cycles |
|---|---|---|
| L1 cache hit | ~1 ns | ~4 |
| L2 cache hit | ~4 ns | ~16 |
| L3 cache hit | ~12–30 ns | ~50–120 |
| Main memory (DRAM) | ~80–100 ns | ~320–400 |
| Branch misprediction | ~15–20 cycles | ~4–5 ns |
| Mutex lock/unlock (uncontended) | ~20–40 ns | ~100+ |
| Kernel-bypass NIC, wire to user space | ~1–2 μs | — |
| Standard kernel network stack | tens of μs | — |
Worked example: a tick-to-trade latency budget
Interviewer: "Your strategy must react to a market-data tick and send an order within 1.5 μs, measured wire-to-wire. Kernel-bypass networking costs 1 μs round-trip in total. How many main-memory accesses can your hot path afford?"
Set up the budget. The software budget is what remains after networking:
$$T_{\text{software}} = 1.5\,\mu s - 1.0\,\mu s = 500\ \text{ns}.$$
At 4 GHz, one cycle is $0.25$ ns, so the budget is $500 / 0.25 = 2{,}000$ cycles. Now decompose the hot path: suppose decode + book update + signal + order encode is roughly 800 cycles of pure compute when everything hits L1. That leaves $2{,}000 - 800 = 1{,}200$ cycles for memory stalls. A dependent DRAM miss costs about 400 cycles, so you can afford at most
$$k = \left\lfloor \frac{1{,}200}{400} \right\rfloor = 3$$
serialized cache misses. Three. That single number explains most low-latency design: flat arrays instead of pointer-chasing maps, order books stored as contiguous price-level arrays, and everything hot pinned into a working set that fits in L1/L2.
Strong candidates then add the branch term. If the hot path has $n = 200$ branches and the predictor is $99\%$ accurate with a 16-cycle penalty, the expected cost is
$$E[\text{penalty}] = n(1-p)\,c = 200 \times 0.01 \times 16 = 32\ \text{cycles} \approx 8\ \text{ns},$$
which is noise on average — but the interviewer wants you to note that tail latency is what kills you, and a cold branch on the rare "actually send an order" path is exactly where mispredictions cluster. Hence tricks like keeping the send path warm with dummy work.
The lock-free SPSC queue question
The most common concurrency question: implement a fixed-size ring buffer where one thread produces (network reader) and one consumes (strategy). Things interviewers probe:
- Capacity a power of two, so index wrap is
idx & (N-1)instead of an expensive modulo. - Memory ordering: producer publishes with a release store to the write index; consumer reads it with an acquire load. Say why: the release/acquire pair guarantees the payload write happens-before the consumer's read. Defaulting everything to
seq_cstis correct but signals you don't know what it costs. - False sharing: the head and tail indices must live on separate cache lines (
alignas(64)), otherwise the two cores ping-pong the same line and throughput collapses by an order of magnitude. - Follow-up: "Why is SPSC so much easier than MPMC?" Because with one writer per variable you never need a compare-and-swap loop — no contention, no ABA problem.
Traps that fail candidates
- Reciting "avoid virtual functions" without numbers. A predicted virtual call is a few cycles; the real costs are the missed inlining and the possible I-cache miss. Nuance beats slogans.
- Claiming
volatilegives thread safety. Instant red flag in C++; it is for memory-mapped I/O, not synchronization. - Optimizing the mean, ignoring the tail. Firms care about p99/p99.9. Mention allocator jitter, page faults, and why hot paths pre-allocate and pre-fault everything.
- Not measuring. When asked "how would you verify this is faster," the answer involves cycle counters, perf counters for cache misses and mispredictions, and realistic replayed market data — not "big-O analysis."
Firms differ in emphasis: Optiver and Headlands lean heavily on C++ mechanics, while pure-research tracks may skip this round entirely — check where your target sits on the interview difficulty tier list and calibrate.
Practice next
Drill the adjacent material before the loop: our coding interview question bank covers the algorithmic layer these rounds start from, the full problem bank has 2,800+ worked problems across systems-adjacent probability and brainteasers, and the trading games let you feel why microseconds and adverse selection matter in the first place.
More topic guides
- The Airplane Seat Problem: Why the Answer Is 1/2 (Three Proofs)
- Bayes' Theorem in Quant Interviews
- Behavioral Interview Questions at Trading Firms (With Answer Frameworks)
- Coin Flip Questions in Quant Interviews
- Dice Questions in Quant Interviews
- Fermi Estimation Interview Questions at Trading Firms
- Gambler's Ruin in Quant Interviews
- The Kelly Criterion in Quant Interviews
- Linear Regression Interview Questions: OLS Assumptions, R² Traps & Regression to the Mean
- Machine Learning Quant Interview Questions: Overfitting, Cross-Validation & Feature Leakage
- The Market Making Game Interview: How to Answer 'Make Me a Market'
- Market Microstructure Interview Questions: Order Books, Spreads & Adverse Selection
- Markov Chains in Quant Interviews
- Martingales in Quant Interviews
- Mental Math for Trading Interviews: Training Plan, Zetamac Benchmarks & Firm Tests
- The Monty Hall Problem — and the Variants Interviews Actually Ask
- Number Sequence Tests in Trading Interviews: The 8 Pattern Types & How to Practice
- Optimal Stopping in Quant Interviews
- Options Pricing Interview Questions: Black-Scholes, Greeks & Put-Call Parity
- Quant Interview Cheat Sheet: Probability, Markov Chains, Options & Linear Algebra (Free PDF)
- Random Walks in Quant Interviews
- Statistics Interview Questions for Quant Roles: Hypothesis Testing, MLE & p-Value Traps
- Stochastic Calculus Interview Questions: Ito's Lemma, SDEs & Brownian Motion
- Time Series Interview Questions: Stationarity, ARMA & Autocorrelation Traps
- Top 50 Quant Interview Questions (With Full Solutions)
- All guides & explainers
Frequently asked questions
What C++ topics do HFT firms actually test in low-latency interviews?
The core areas are the hardware side (cache hierarchy, branch prediction, false sharing, NUMA), the C++ memory model and atomics (acquire/release semantics, lock-free SPSC queues), and hot-path language mechanics like avoiding allocation, cache-line alignment, and the true cost of virtual calls. Order book design questions combine all three and are near-universal for quant developer roles.
What is the lock-free queue interview question?
You are asked to implement a fixed-size single-producer/single-consumer ring buffer without locks, typically for passing market data between a network thread and a strategy thread. Interviewers check that you use a power-of-two capacity with bitmask wrapping, correct release/acquire memory ordering on the indices, and cache-line padding to prevent false sharing between head and tail.
Do quant developers need to memorize latency numbers?
Yes, at least to order of magnitude: roughly 1 ns for an L1 hit, ~100 ns for a DRAM access, ~15-20 cycles for a branch misprediction, and low microseconds for kernel-bypass networking. Interviewers use these numbers in back-of-envelope latency budget questions, and hesitating on them suggests you have never profiled real code.
Is the HFT C++ interview harder than LeetCode?
It is different rather than strictly harder: algorithmic questions are usually LeetCode-medium, but the differentiating rounds test systems knowledge that LeetCode never touches, like memory ordering, cache behavior, and tail-latency reasoning. Candidates who only grind algorithm problems routinely fail on questions such as explaining why false sharing collapses throughput.
Practice the real thing
QuantVault has 2,800+ quant interview problems with full solutions, intuition, and hints, firm-by-firm interview funnels, and an auto-graded coding judge. Start free.