Designing a Fair Value Model: Data, Method, and Time Frame

Finance · Medium · Free problem

Walk me through how you would build a fair value model for a liquid financial asset. Be specific:

  1. What data sources would you use, and why?
  2. What modeling technique would you choose, and what drives that choice?
  3. What time frame and frequency are you modeling?
  4. Describe one proprietary or derived signal -- how is it constructed, what does it capture, and how does it feed into the broader model?

Hints

  1. Start by fixing the time frame -- it determines everything else. HFT models are pure microstructure; daily models are factor models. Picking a concrete horizon (e.g., 5 minutes) lets you make specific choices about data and methods.
  2. Fair value is your estimate of where the asset should trade, not where it does trade. The signal is the gap between the two. Think about what information is most predictive of that gap at your chosen horizon.
  3. For a proprietary signal, describe something measurable from raw market data -- order flow imbalance, quote stuffing ratios, cross-asset lead-lag. Be specific about the formula, not just the concept.

Worked Solution

How to Think About It: There is no single right answer here -- fair value modeling spans microsecond order-book models to monthly macro factor models. What the interviewer wants to see is that you can articulate a coherent, internally consistent framework: data and model and time frame all aligned, with a clear view of what edge you are trying to capture. The worst answer is a laundry list of buzzwords. The best answer picks one concrete framework and defends it.

Let us walk through a medium-frequency equities example -- holding periods of minutes to hours -- which is the most common context for prop trading roles.

Key Insight: Fair value is not price. It is your best estimate of where the asset should trade given all available information. The model's job is to identify the gap between observed price and fair value, and that gap is your trade signal.

The Method:

Step 1 -- Define the target. For a mid-frequency equities model, fair value $V_t$ is the expected price $k$ minutes out, conditional on current information. We are trying to predict the short-run direction, not the long-run fundamental value.

Step 2 -- Data sources. Three tiers:

  • Primary (market microstructure): Level-2 order book (bid/ask depth at each level), trade tape (price, size, aggressor side), and quote updates. This is the highest-frequency signal and the closest to revealed information.
  • Secondary (cross-asset): Correlated instruments -- index futures, ETFs, sector peers. If SPY moves before AAPL, that is a leading indicator.
  • Tertiary (fundamental/alternative): Earnings estimates, short interest, institutional flow data, news sentiment. These move at daily or weekly frequency and act as slow-moving anchors.

Step 3 -- Modeling technique. For a mid-frequency model, a gradient-boosted tree (XGBoost/LightGBM) over a curated feature set. Reasons:

  • Non-linear interactions between features (e.g., the effect of order imbalance is larger when spreads are wide) are captured automatically.
  • Handles mixed-frequency features without explicit bridge models.
  • Regularization prevents overfitting on noisy financial data.
  • Interpretable via feature importance -- critical for diagnosing regime changes.

For HFT (sub-second), switch to a linear model or small neural net trained on pure microstructure data: latency matters more than model sophistication.

Step 4 -- Time frame. Predict 5-minute returns, trained on 2 years of rolling data, retrained daily. The 5-minute horizon is long enough for the signal to have economic value (net of transaction costs) and short enough that fundamentals are not the dominant driver.

Step 5 -- A proprietary signal: order flow imbalance (OFI). This is a clean example of a derived microstructure signal.

Definition: at each time step $t$, compute

$\text{OFI}_t = \Delta \text{BidSize}_t \cdot \mathbf{1}[\text{BidPrice}_t \geq \text{BidPrice}_{t-1}] - \Delta \text{AskSize}_t \cdot \mathbf{1}[\text{AskPrice}_t \leq \text{AskPrice}_{t-1}]$

where $\Delta \text{BidSize}$ is the change in quantity at the best bid. Intuitively, OFI measures whether buyers or sellers are more aggressively adding liquidity at the top of book.

How it feeds in: OFI is a short-horizon predictor of price impact over the next 1-5 minutes. It enters the model both raw (current OFI) and as a decayed moving average (capturing persistent buying pressure). It interacts with spread: high OFI in a tight market signals stronger directional conviction than high OFI in a wide market.

Practical Considerations:

  • Overfitting: Financial data is non-stationary. Walk-forward cross-validation (never train on future data) and held-out out-of-sample testing are non-negotiable.
  • Transaction costs: Model the signal gross of costs first, then net. If the Sharpe ratio collapses after costs, the signal is not real.
  • Regime awareness: Model performance degrades during macro events (earnings, FOMC). Consider using a regime classifier to dial down position sizing or switch to a simpler model in high-vol regimes.

Answer: For a mid-frequency equities fair value model: use Level-2 order book + cross-asset returns as primary inputs; gradient-boosted trees for the model; 5-minute return target trained on 2 years rolling. OFI is a clean proprietary microstructure signal -- constructed from best-bid/ask queue changes, measuring aggressive liquidity provision, and entering the model both raw and as a moving average interacted with spread.

Intuition

Fair value modeling is really about information hierarchy. Some signals update every microsecond (order book), some every minute (trade flow), some daily (short interest), some monthly (fundamentals). A good fair value model is an information aggregator that weights each source appropriately for the chosen time horizon. Mismatches between signal frequency and prediction horizon are one of the most common mistakes: using daily fundamental data to predict 30-second returns is wasted effort because fundamentals do not resolve that fast.

The open-ended nature of this question is intentional -- interviewers use it to probe whether you have actually built something. The best candidates give a specific answer grounded in a real system they designed or worked on. The worst candidates recite a textbook taxonomy of model types. If you are asked this in an interview and you have never built a fair value model, pick one concrete, defensible framework (e.g., 'a cross-sectional factor model for daily equity returns') and reason through it carefully. Depth beats breadth.

Open the full interactive solver →