Daily Return Confidence Interval from Annual Data

Statistics · Hard · Free problem

You have $N$ years of simple annual returns $\{R_1, R_2, \ldots, R_N\}$ with sample mean $\bar{R}_Y$ and sample variance $s_Y^2$. You want to build a 95% confidence interval for the true daily mean return $\mu_d$.

Derive the confidence interval under three increasingly realistic assumptions:

  1. I.I.D. daily returns: Daily log returns are independent and identically distributed with constant variance.
  2. Serial correlation with known long-run variance: Daily returns are autocorrelated, but you know the long-run variance $\Omega$ (or have an estimate of it, e.g., from a Newey-West estimator).
  3. Stochastic volatility: Daily variance is time-varying.

In each case, state your assumptions explicitly and show how to map annual statistics to daily statistics using aggregation identities.

Hints

  1. The answer is always in $[1, n+1]$; you only care about elements in $[1, n]$. This observation is the key to reducing the problem.
  2. Under i.i.d. daily returns, variances aggregate linearly: $\sigma_Y^2 = 252 \sigma_d^2$. Invert this to convert annual statistics to daily.
  3. Under autocorrelation, replace $\sigma_d^2$ with the long-run variance $\Omega = \sigma_d^2(1 + 2\sum_k \rho_k)$. Positive autocorrelation widens the CI; mean reversion narrows it.

Worked Solution

How to Think About It: The challenge is unit conversion: you observed returns at annual frequency but want to make inference about a daily parameter. The conversion depends critically on how daily returns aggregate to annual returns -- and that depends on the dependence structure. Under i.i.d., variances add linearly and the conversion is clean. Under autocorrelation, variances add with cross-terms (the long-run variance is larger or smaller than the simple sum). Under stochastic volatility, even the variance of annual variance is uncertain, widening the CI further.

A useful sanity check: with $N = 20$ years of annual data and roughly 252 trading days per year, you have an effective sample of

0 \times 252 = 5{,}040$ daily observations -- but autocorrelation and volatility clustering can dramatically reduce the effective sample size.

Case (i): I.I.D. Daily Returns

Assumption: Daily log returns $r_1, r_2, \ldots$ are i.i.d. with mean $\mu_d$ and variance $\sigma_d^2$. Annual return equals the sum of 252 daily returns (approximation; for simplicity treat log returns as simple returns).

Aggregation identities: $\mu_Y = 252 \mu_d \implies \mu_d = \frac{\bar{R}_Y}{252}$ $\sigma_Y^2 = 252 \sigma_d^2 \implies \sigma_d^2 = \frac{s_Y^2}{252}$

Variance of the sample mean $\hat{\mu}_d$: We observe $N$ annual returns. The sample mean of annual returns has variance $\sigma_Y^2 / N = 252 \sigma_d^2 / N$. Converting to daily: $\text{Var}(\hat{\mu}_d) = \frac{\sigma_Y^2}{N} \cdot \frac{1}{252^2} = \frac{s_Y^2}{252^2 N}$

95% CI for $\mu_d$: $\hat{\mu}_d \pm 1.96 \cdot \frac{s_Y}{252\sqrt{N}}$

For example: $\bar{R}_Y = 8\%$, $s_Y = 15\%$, $N = 20$ years. - $\hat{\mu}_d = 0.08/252 \approx 0.032\%$ per day - Standard error: $0.15/(252 \times \sqrt{20}) \approx 0.013\%$ - 95% CI: $[0.006\%, 0.058\%]$ per day

Case (ii): Serial Correlation with Known Long-Run Variance

Assumption: Daily returns are stationary but autocorrelated with autocovariances $\gamma_k = \text{Cov}(r_t, r_{t-k})$. Define the long-run variance: $\Omega = \sum_{k=-\infty}^{\infty} \gamma_k = \sigma_d^2 \left(1 + 2 \sum_{k=1}^{\infty} \rho_k\right)$ where $\rho_k = \gamma_k / \sigma_d^2$ is the autocorrelation at lag $k$.

For a sequence of $T$ daily returns, the variance of the sample mean is approximately $\Omega / T$, not $\sigma_d^2 / T$. Positive autocorrelation ($\rho_k > 0$) means $\Omega > \sigma_d^2$ -- returns trend, the effective sample size shrinks, and the CI widens.

With $N$ years of annual data, the annual variance is 52 \cdot \Omega$ (if the annual horizon is long enough to average out autocorrelation). The CI for $\mu_d$ becomes: $\hat{\mu}_d \pm 1.96 \cdot \frac{\sqrt{\Omega}}{\sqrt{N \cdot 252}}$

In practice, estimate $\Omega$ using the Newey-West estimator at the daily level: $\hat{\Omega} = \hat{\gamma}_0 + 2 \sum_{k=1}^{K} \left(1 - \frac{k}{K+1}\right) \hat{\gamma}_k$ with bandwidth $K \sim N^{1/3}$.

If returns are positively autocorrelated (momentum), $\Omega > \sigma_d^2$ and the CI is wider than in Case (i). If mean-reverting (negative autocorrelation), $\Omega < \sigma_d^2$ and the CI narrows.

Case (iii): Stochastic Volatility

Assumption: Daily variance $\sigma_d^2(t)$ is itself a random process (e.g., GARCH, Heston).

Now even the annual variance $s_Y^2 = \frac{1}{N-1}\sum_{i=1}^{N}(R_i - \bar{R}_Y)^2$ is a noisy estimate because volatility itself varies year to year. This introduces two layers of uncertainty: 1. Uncertainty about $\mu_d$ conditional on knowing volatility. 2. Uncertainty about volatility itself.

The aggregation identity becomes: $\sigma_Y^2 = E\left[\sum_{t} \sigma_d^2(t)\right] + \text{Var}\left[\sum_{t} r_t\right]_{\text{extra volatility of vol}}$

The CI can no longer be expressed as a simple $\pm z \cdot \text{SE}$ formula without a model for the volatility process. Practical approaches: - Fit a GARCH(1,1) model at daily frequency; use the estimated parameter covariance matrix for inference on $\mu_d$. - Use realized variance (e.g., from high-frequency data) to reduce volatility uncertainty. - Report a wider CI using a conservative estimate of $\sigma_Y^2$ (e.g., using the 75th percentile of historical annual variances).

Summary Table:

| Case | 95% CI for $\mu_d$ | Key assumption | |------|-------------------|----------------| | (i) I.I.D. | $\hat{\mu}_d \pm 1.96 \cdot s_Y / (252\sqrt{N})$ | Constant variance, no autocorrelation | | (ii) Autocorrelated | $\hat{\mu}_d \pm 1.96 \cdot \sqrt{\hat{\Omega}} / \sqrt{252 N}$ | Known or estimated LRV | | (iii) Stochastic vol | No closed form; model-dependent, wider than (ii) | Time-varying volatility process |

Answer: Map annual to daily via $\mu_d = \bar{R}_Y / 252$ and $\sigma_d = s_Y / \sqrt{252}$. The standard error of $\hat{\mu}_d$ is $s_Y / (252\sqrt{N})$ under i.i.d., $\sqrt{\hat{\Omega}/( 252 N)}$ under serial correlation, and model-dependent (wider) under stochastic volatility.

Intuition

The depressing practical takeaway from this problem is how little statistical power annual return data provides. Even with 20 years of data, the 95% CI for the daily mean return under i.i.d. assumptions spans roughly $\pm 0.025\%$ per day -- which translates to $\pm 6\%$ annualized. Almost no trading strategy can be confidently distinguished from a zero-mean process using return data alone. This is why practitioners obsess over Sharpe ratios and use much higher-frequency data when available.

The three cases also teach a progressive lesson about model risk. Case (i) is the textbook answer. Case (ii) is what any experienced time-series econometrician would do. Case (iii) is the honest answer for equities, where volatility clustering is pervasive. Each step widens the CI -- and the actual confidence interval for a daily mean return estimate is usually much wider than naive calculations suggest. In practice, the standard error of the mean is dominated by volatility, not sample size, which is why mean estimation is so much harder than volatility estimation.

Open the full interactive solver →