Effective Sample Size Under Autocorrelation

Time Series · Hard · Free problem

You compute a daily information coefficient (IC) series $IC(1), \ldots, IC(T)$ and estimate the mean $\bar{IC}$. Assume the IC series follows a stationary AR(1) process:

$IC(t) - \mu = \phi\,(IC(t-1) - \mu) + \varepsilon_t, \quad |\phi| < 1, \quad \varepsilon_t \sim N(0, \sigma^2) \text{ i.i.d.}$

Derive $\operatorname{Var}(\bar{IC})$ in closed form as a function of $T$, $\phi$, and $\sigma^2$.

Define an "effective sample size" $T_{\text{eff}}$ so that $\operatorname{Var}(\bar{IC}) = \sigma_{IC}^2 / T_{\text{eff}}$, where $\sigma_{IC}^2 = \sigma^2 / (1 - \phi^2)$ is the marginal variance of the IC series.

Show how $T_{\text{eff}}$ behaves as $\phi \to 1^{-}$, and interpret the result.

Hints

The variance of the sample mean involves summing all pairwise covariances -- for an AR(1), each autocovariance $\gamma(h) = \sigma_{IC}^2 \phi^{|h|}$ is a geometric function of the lag.
Use the geometric series $\sum_{h=0}^{\infty} \phi^h = 1/(1 - \phi)$ to simplify the double sum. The large-$T$ limit is cleaner than the exact finite-$T$ expression.
Define $T_{\text{eff}}$ by matching $\operatorname{Var}(\bar{IC}) = \sigma_{IC}^2 / T_{\text{eff}}$ and isolate the ratio $T_{\text{eff}} / T = (1 - \phi)/(1 + \phi)$.

Worked Solution

How to Think About It: When you average an autocorrelated series, the effective amount of information is much less than the raw sample size $T$ suggests. Intuitively, consecutive IC values are not independent draws -- they are "echoes" of the same shocks. The stronger the autocorrelation, the fewer truly independent observations you have. This is the single most common trap in quant research: you run a strategy for 1,000 days, compute a t-stat using $\sqrt{1000}$, and feel great -- but if daily ICs have $\phi = 0.5$, your effective sample size is only about 330. Your t-stat is inflated by $\sqrt{3}$.

Quick Estimate: For $\phi = 0.5$ and $T = 252$ (one year of daily data), the effective sample size should be roughly $T \cdot (1 - \phi)/(1 + \phi) = 252 \cdot (0.5/1.5) = 84$. So you have roughly one-third the information you thought. For $\phi = 0.9$, it drops to

Intuition

This result is one of the most practically important facts in quantitative research. Whenever you compute a t-statistic, a Sharpe ratio, or any significance measure from time series data, you are implicitly dividing by $\sqrt{T}$. But if the data is autocorrelated, $\sqrt{T}$ overstates how much information you actually have. The correction factor $(1 - \phi)/(1 + \phi)$ shows that even moderate autocorrelation ($\phi = 0.5$) cuts your effective sample by a factor of 3, and high autocorrelation ($\phi = 0.9$) cuts it by a factor of nearly 20. In practice, daily alpha signals, IC series, and P&L streams are often significantly autocorrelated, so naive significance tests vastly overstate confidence. The fix is simple: replace $T$ with $T_{\text{eff}}$ in your standard error calculations, or equivalently, use Newey-West or other HAC standard errors that account for serial dependence.