Signal Types in Statistical Arbitrage

Finance · Medium · Free problem

You are building a statistical arbitrage strategy at a quantitative trading firm. Walk through the main categories of signals you would consider, how they are constructed, and how you would combine them into a composite alpha.

Specifically:

What are the major signal families (e.g., mean reversion, momentum, fundamental, microstructure)?
For each family, give a concrete example of how the signal is computed.
How do you combine multiple signals, and what pitfalls should you watch out for?

Hints

Think about the different economic mechanisms behind why asset prices might be predictable -- mean reversion, momentum, information asymmetry, and fundamental mispricing are distinct sources of alpha.
For each signal family, be concrete: specify how you would compute the signal numerically (e.g., z-score of a spread for mean reversion, 12-1 month returns for momentum).
When discussing signal combination, address the practical issues: how do you avoid overfitting? How do transaction costs affect which signals are tradeable? What is walk-forward validation?

Worked Solution

How to Think About It: Stat arb is about finding small, repeatable edges across many instruments and combining them systematically. No single signal is a money machine -- the edge comes from diversification across signals, instruments, and time. When an interviewer asks this, they want to see that you understand the taxonomy of signals, can give concrete construction details, and know the practical issues (decay, crowding, overfitting) that separate a textbook answer from a practitioner's answer.

Key Insight: Signals fall into a few natural families based on the economic mechanism they exploit. The best stat arb desks combine signals from multiple families because they tend to be weakly correlated -- mean reversion and momentum, for example, exploit opposite market inefficiencies.

The Method:

1. Mean Reversion Signals

These exploit temporary dislocations that revert to fair value.

*Pairs/basket trading:* Compute the spread $S_t = P_t^A - \beta P_t^B$ between cointegrated instruments. Standardize to a z-score: $z_t = (S_t - \bar{S}) / \sigma_S$. Trade when $|z_t| > 2$, exit when $z_t \approx 0$.
*Statistical factor residuals:* Run PCA on returns, compute each stock's residual from the first $k$ factors. Mean reversion in the residual is the signal.
*Ornstein-Uhlenbeck model:* Fit $dS = \theta(\mu - S)\,dt + \sigma\,dW$ to estimate half-life $\tau = \ln 2 / \theta$. Faster mean reversion means a more tradeable signal.

2. Momentum / Trend Signals

These exploit the tendency of winners to keep winning (and losers to keep losing) over medium horizons.

*Cross-sectional momentum:* Rank stocks by 12-month return (skipping the most recent month). Go long the top decile, short the bottom decile.
*Time-series momentum:* For each instrument, compare a short-term moving average to a long-term one. Signal = sign of $\text{MA}_{20} - \text{MA}_{200}$.

3. Fundamental / Value Signals

These exploit mispricings relative to accounting or economic fundamentals.

*Earnings yield:* Rank stocks by $E/P$ ratio. High earnings yield stocks tend to outperform.
*Earnings surprise:* Compute the Standardized Unexpected Earnings (SUE) score. Stocks with large positive surprises tend to drift upward for weeks (post-earnings announcement drift).

4. Microstructure Signals

These exploit information embedded in order flow and market dynamics at short horizons.

*Order flow imbalance (OFI):* Compute $\text{OFI}_t = \Delta B_t^{\text{best}} - \Delta A_t^{\text{best}}$, the net change in best bid vs. best ask size. Positive OFI predicts short-term price increases.
*Kyle's lambda:* Estimate price impact $\lambda$ from $\Delta P = \lambda \cdot Q + \epsilon$. High $\lambda$ signals low liquidity and wider spreads.

5. Alternative Data Signals

*Sentiment:* NLP on news headlines or social media. Aggregate sentiment scores per stock.
*Options-implied signals:* Put-call ratio, implied volatility skew. Rising put-call ratios signal bearish sentiment.

Signal Combination:

Combine signals into a composite alpha using a weighted sum: $\alpha_i = \sum_j w_j \cdot z_{ij}$, where $z_{ij}$ is the standardized signal $j$ for instrument $i$. Common approaches for choosing weights:

Equal weighting (robust baseline)
Inverse-variance weighting
Regression-based (Fama-MacBeth cross-sectional regression)
Machine learning (e.g., boosted trees, but beware overfitting)

Practical Considerations:

*Signal decay:* Most signals lose predictive power over time as they get crowded. Monitor the information coefficient (IC) over rolling windows.
*Turnover and transaction costs:* A high-turnover signal can have positive gross alpha but negative net alpha after costs.
*Overfitting:* Walk-forward validation is essential. Never optimize signals on the same data you use to evaluate them.
*Correlation between signals:* Combine signals that are weakly correlated for diversification. Two highly correlated momentum signals do not give you twice the edge.

Answer: Stat arb strategies draw signals from five main families: mean reversion (pairs, factor residuals), momentum (cross-sectional and time-series), fundamental (value, earnings surprise), microstructure (order flow, price impact), and alternative data (sentiment, options-implied). Signals are standardized, combined via weighted averaging or regression, and the key practical challenges are decay, transaction costs, and overfitting.

Intuition

The core insight of stat arb is that no single signal gives you a reliable edge -- the edge comes from combining many weak, diversified signals across many instruments. Each signal family exploits a different market inefficiency: mean reversion captures temporary dislocations, momentum captures slow information diffusion, microstructure signals capture short-term supply/demand imbalances. These mechanisms are largely independent, so combining them provides diversification in alpha space, not just in position space.

In practice, the hardest part is not finding signals -- it is making sure they survive transaction costs and out-of-sample testing. A signal with an in-sample Sharpe of 3 that decays to 0 out-of-sample is worse than useless (it cost you research time). Walk-forward testing, realistic cost models, and decay monitoring are what separate a working stat arb desk from a backtesting exercise.

Open the full interactive solver →