Defining an Alpha Signal

Finance · Medium · Free problem

You observe daily close-to-close returns $r_i(t)$ for assets $i \in \{1, \ldots, m\}$ at times $t \in \mathbb{Z}$. Define what you mean by an "alpha" signal $s_i(t)$ intended to predict future returns.

Your definition must be precise and complete. Specifically, address all three of the following:

Prediction horizon: Over what future window does $s_i(t)$ predict? How do you aggregate returns over that window?
Portfolio construction rule: How do you translate signal values $s_i(t)$ into portfolio weights $w_i(t)$? What constraints do you impose?
Excess/benchmark return: What does "predicting returns" mean -- raw returns, market-adjusted, factor-adjusted? How do you measure whether $s_i(t)$ is actually working?

Hints

An alpha signal definition is incomplete unless it specifies what return it is predicting -- raw or benchmark-adjusted, over what horizon. Start there.
The standard formalism is a cross-sectional regression of $h$-day excess returns on the signal at each time $t$: $\tilde{R}_i(t,h) = \alpha + \gamma s_i(t) + \varepsilon_i(t)$. Alpha exists if $\hat{\gamma}$ is consistently positive.
For the portfolio rule, normalize the signal cross-sectionally to zero mean and unit variance, then set weights proportional to the normalized signal -- this ensures dollar neutrality and makes the IC a natural performance measure.

Worked Solution

How to Think About It: The word "alpha" gets used loosely in practice -- a discretionary PM means something different from a stat arb quant. The question is asking you to nail down the formal definition so that you could actually test whether a signal has alpha. That requires pinning down three things before you write a single regression: what you are predicting (the target return, including horizon and benchmark), how the signal maps to positions, and how you evaluate predictive power. If you leave any of these vague, your backtest is not well-defined. This is a design question, not a computation -- the right answer is a clean, testable definition.

Key Insight: Alpha is not a property of a signal alone -- it is a property of a (signal, target, portfolio rule) triple. Two traders can use the same raw signal and reach opposite conclusions about whether it has alpha, simply because they defined the prediction target differently.

The Method:

1. Fix the prediction horizon $h$. Define the $h$-day forward return for asset $i$ at time $t$: $R_i(t, h) = \sum_{\tau=1}^{h} r_i(t + \tau)$ or equivalently in log-return form $R_i(t,h) = \log(P_i(t+h)/P_i(t))$. For daily signals, $h = 1$ is common in high-frequency stat arb; $h = 5$ or $h = 21$ are typical for medium-frequency strategies. The choice of $h$ defines the signal's "shelf life" and determines turnover.

2. Define the benchmark/excess return. Raw returns contain systematic risk (market beta, sector tilts, factor exposures) that has nothing to do with your signal's predictive content. Strip these out by defining the excess return: $\tilde{R}_i(t, h) = R_i(t, h) - \beta_i^\top f(t, h)$ where $f(t,h)$ is a vector of factor returns (e.g., Fama-French, Barra) over the same window and $\beta_i$ is asset $i

s factor loading. In the simplest case, $\tilde{R}_i$ is just the CAPM residual: $R_i - \beta_i R_M$. The signal $s_i(t)$ is said to have alpha if it predicts $\tilde{R}_i(t,h)$, not $R_i(t,h)$ itself.

3. State the signal's predictive claim formally. The signal $s_i(t)$ -- which must be constructed from information available at time $t$ only (no look-ahead) -- has alpha if: $\text{Cov}(s_i(t),\, \tilde{R}_i(t, h)) > 0$ More precisely, in a cross-sectional regression at each $t$: $\tilde{R}_i(t, h) = \alpha + \gamma \cdot s_i(t) + \varepsilon_i(t)$ Alpha exists if $\hat{\gamma}$ is statistically and economically significant, consistently across time.

4. Define the portfolio construction rule. Signal values must be mapped to portfolio weights $w_i(t)$. The canonical choices are: - Rank-based (cross-sectional z-score): Normalize $s_i(t)$ to zero mean and unit variance in the cross-section at each $t$. Then set $w_i(t) = s_i(t) / \sum_j |s_j(t)|$. This is dollar-neutral by construction. - Long-short quintile: Sort assets by $s_i(t)$, go long the top quintile equally weighted, short the bottom quintile equally weighted. - Optimization-based: Maximize $\sum_i w_i s_i(t) - \lambda w^\top \Sigma w$ subject to $\sum_i w_i = 0$, where $\Sigma$ is the return covariance matrix and $\lambda$ controls risk aversion.

The portfolio must be dollar-neutral (or beta-neutral) to isolate the alpha from directional market exposure. Define $\sum_i w_i = 0$ and optionally $\sum_i w_i \beta_i = 0$.

5. Define the realized alpha of the strategy. The portfolio's excess return at time $t$ is: $\Pi(t) = \sum_i w_i(t) \cdot \tilde{R}_i(t, h)$ The signal has alpha if $\mathbb{E}[\Pi(t)] > 0$ with a favorable information ratio $\text{IR} = \mathbb{E}[\Pi] / \text{Std}[\Pi]$. In practice, estimate this from the time series $\{\Pi(t)\}$ and test for significance using a $t$-test or block bootstrap.

Practical Considerations: - Look-ahead bias: $s_i(t)$ must use only data up to and including time $t$. Closing prices, known factor loadings, and end-of-day accounting data are all admissible. Tomorrow's volume is not. - Signal decay: Plot the information coefficient (IC = $\text{Corr}(s_i(t), \tilde{R}_i(t,h))$) as a function of $h$. A signal with IC that decays to zero within 2 days should not be traded with $h = 21$. - Transaction costs: A theoretical $\gamma > 0$ in the regression means nothing if the strategy requires 20% daily turnover and your trading costs eat the spread. Define "alpha" net of realistic transaction costs, not gross. - Multiple testing: If you define alpha as $p < 0.05$ and test 100 signals, expect 5 false positives. Use Bonferroni correction or report the full distribution of ICs.

Answer: An alpha signal $s_i(t)$ is a real-valued, time-$t$-measurable function of assets and time, defined by: (1) a prediction horizon $h$ giving the forward return target $R_i(t,h) = \sum_{\tau=1}^{h} r_i(t+\tau)$; (2) a benchmark model defining excess returns $\tilde{R}_i(t,h) = R_i(t,h) - \beta_i^\top f(t,h)$; and (3) a portfolio construction rule mapping $s_i(t)$ to dollar-neutral weights $w_i(t)$. The signal has alpha if $\text{Cov}(s_i(t), \tilde{R}_i(t,h)) > 0$ consistently over time, measured by a positive and significant information ratio on the resulting portfolio.

Intuition

The reason practitioners obsess over the precise definition of alpha is that it is easy to construct signals that appear predictive under one definition and are pure noise under another. A signal correlated with market beta will look like it predicts raw returns, but it is just long the market in disguise -- there is no edge. By stripping out systematic factor exposures and evaluating on residual returns, you isolate what the signal actually knows beyond what any passive investor already gets for free. This is the distinction between beta (systematic, cheap, replicable) and alpha (idiosyncratic, hard-won, perishable).

In real quant shops, the definition of the target return is a first-order decision that determines the entire development pipeline. Choosing $h = 1$ versus $h = 5$ changes the universe of tradeable signals, the required execution speed, and the transaction cost budget. Choosing CAPM residuals versus Barra multi-factor residuals changes what counts as "explained" and what remains to be explained. Two quants working on the same signal can get opposite answers about whether it has alpha simply because they made different choices on these dimensions. The discipline of writing down a complete, precise definition before running a single backtest is what separates serious signal research from pattern-matching noise.

Open the full interactive solver →