HV-Block Cross-Validation for Dependent Data

Machine Learning · Hard · Free problem

You are building a model to forecast $r_{t+1}$ (next-period returns) using features that may include overlapping or lagged variables. Standard K-fold cross-validation is inappropriate because the data is serially correlated.

Design. Describe an hv-block cross-validation scheme. In this scheme, the training set for each fold excludes $h$ observations before and $v$ observations after the validation block to prevent information leakage. Draw or describe the structure clearly.

Parameter choices. How should $h$ and $v$ be chosen in terms of the dependence structure of the data? What happens if they are too small or too large?

Bias-variance trade-offs. Compare the bias and variance of out-of-sample error estimates from hv-block CV to those from naive K-fold CV. Why does naive K-fold fail, and what does the blocking fix?

Hints

Naive K-fold treats observations as exchangeable. What goes wrong when neighboring observations in time carry nearly the same information?
The fix is to create a dead zone around each validation block. How wide should this zone be, and what determines the right width?
The exclusion parameters $h$ (before) and $v$ (after) should be at least as large as the autocorrelation range $L$ of the data. Too small means residual leakage; too large means you waste training data and increase variance.

Worked Solution

How to Think About It: The fundamental problem with naive K-fold CV on time series is information leakage. If observation $t$ is in the validation fold and observation $t-1$ is in the training fold, the model effectively gets to peek at near-future information through autocorrelation. This makes the CV error estimate optimistically biased -- your model looks better in backtesting than it will perform live. The hv-block scheme fixes this by creating a buffer zone (embargo) around each validation block, ensuring that no training observation is close enough in time to leak information.

Key Insight: The buffer sizes $h$ and $v$ should match the range of temporal dependence in the data. If the autocorrelation in your features or returns dies out after $L$ lags, you need $h \ge L$ and $v \ge L$ to fully prevent leakage.

The Method:

Part 1 -- The HV-Block Scheme:

Suppose you have $T$ observations indexed $t = 1, \ldots, T$ and you split them into $K$ contiguous validation blocks $B_1, \ldots, B_K$.

For fold $k$ with validation block $B_k = \{t_k, t_k+1, \ldots, t_k + w - 1\}$ (where $w$ is the block width): - Excluded zone before: observations $\{t_k - h, \ldots, t_k - 1\}$ are removed from training - Validation block: observations $B_k$ are used for evaluation - Excluded zone after: observations $\{t_k + w, \ldots, t_k + w + v - 1\}$ are removed from training - Training set: all remaining observations outside the validation block and both exclusion zones

Schematically for one fold: $[\text{Train}] \;\underbrace{|\; \text{gap } h \;}_{\text{excluded}} |\; \underbrace{\text{Val block}}_{\text{evaluate}} \;| \underbrace{\;\text{gap } v \;|}_{\text{excluded}} [\text{Train}]$

The key difference from walk-forward validation: hv-block CV uses training data on both sides of the validation block (past and future), whereas walk-forward only trains on past data. HV-block is appropriate when your goal is to estimate generalization error; walk-forward is appropriate when you want to simulate realistic trading.

Part 2 -- Choosing $h$ and $v$:

Set $h$ and $v$ based on the autocorrelation range of the data: - Compute the autocorrelation function (ACF) of the features and the response $r_{t+1}$ - Let $L$ be the lag at which the ACF becomes negligible (e.g., drops below

Intuition

HV-block CV is the time-series analyst's version of the same insight behind purging and embargoing in financial backtesting: if your training and test data are correlated, your performance estimates are lying to you. The buffer zones serve the same function as an embargo period in a backtest -- they create a clean separation between what the model has seen and what it is being evaluated on. The bias-variance trade-off in choosing $h$ and $v$ mirrors a fundamental tension in all of applied statistics: more aggressive debiasing (larger buffers) means less data and noisier estimates. In practice, getting the buffer sizes right matters enormously -- a buffer that is too small by even one lag can let enough leakage through to make a worthless strategy look profitable.