Realized Kernel Estimator Under Microstructure Noise

Time Series · Hard · Free problem

Observed tick prices satisfy $Y_{t_i} = X_{t_i} + \epsilon_{t_i}$, where $X$ is an Ito semimartingale (the efficient price) and $\epsilon_{t_i}$ is i.i.d. mean-zero noise with variance $\omega^2$ (market microstructure noise from bid-ask bounce, rounding, etc.).

You have $n+1$ equally spaced observations $Y_{t_0}, Y_{t_1}, \ldots, Y_{t_n}$ over $[0, T]$.

  1. Define a realized-kernel (RK) variance estimator using a symmetric, non-negative-definite kernel function $k(\cdot)$ and bandwidth $H$.
  1. State the conditions under which this estimator is consistent for the integrated variance $IV = \int_0^T \sigma_s^2 \, ds$ as $n \to \infty$, in terms of $H$ and $H/n$.
  1. For the Parzen kernel specifically, give the asymptotically optimal order of $H$ in $n$, and explain why the realized kernel is robust to noise while naive realized variance is not.

Hints

  1. Think about what happens to the sum of squared returns when prices contain i.i.d. noise -- how does the expected value of $RV_n$ depend on $n$ and $\omega^2$?
  2. The noise is i.i.d., so its autocovariance structure is very simple: nonzero only at lags 0 and $\pm 1$. A kernel that includes these lags with appropriate weights can cancel the noise bias.
  3. For optimal bandwidth, balance the squared bias (which grows with $H/n$) against the variance (also of order $H/n$). The Parzen kernel's smoothness gives a specific rate -- set $H \propto n^{3/5}$ to minimize MSE.

Worked Solution

How to Think About It: The core tension is this: you want to estimate the integrated variance of the efficient price $X$, but you only observe noisy prices $Y$. Naive realized variance (sum of squared returns) does not converge to $IV$ -- instead it blows up, because the noise contributes a term proportional to

n\omega^2$ that dominates as you sample more frequently. This is the "volatility signature plot" effect every practitioner knows: RV rises with sampling frequency instead of stabilizing. The realized kernel fixes this by down-weighting autocovariances at higher lags through a kernel function, effectively smoothing out the noise contamination.

Key Insight: The realized kernel is a weighted sum of autocovariances of returns. The kernel function assigns weight 1 to the zero-lag autocovariance (which is the usual RV) and decaying weights to higher-lag autocovariances. These lagged terms cancel out the noise-induced bias in the zero-lag term.

The Method:

*Part (i): Definition*

Let $r_j = Y_{t_j} - Y_{t_{j-1}}$ for $j = 1, \ldots, n$ be the observed returns. The realized autocovariance at lag $h$ is:

$\hat{\gamma}_h = \sum_{j=|h|+1}^{n} r_j \, r_{j-h}$

The realized kernel estimator with bandwidth $H$ and kernel function $k(\cdot)$ is:

$RK_n = \sum_{h=-H}^{H} k\!\left(\frac{h}{H+1}\right) \hat{\gamma}_h$

where $k: [-1, 1] \to \mathbb{R}$ is a symmetric function satisfying $k(0) = 1$ and $k(x) = 0$ for $|x| > 1$. The kernel must be positive semi-definite (i.e., the resulting matrix of weights must be non-negative definite) to guarantee $RK_n \geq 0$.

*Part (ii): Consistency conditions*

The realized kernel estimator $RK_n$ is consistent for the integrated variance $IV$ under the following conditions as $n \to \infty$:

Formally, under regularity conditions on $X$ and the noise $\epsilon$:

$RK_n \xrightarrow{p} IV \quad \text{as } n \to \infty, \; H \to \infty, \; H/n \to 0$

The bias of $RK_n$ comes from two sources: (a) the kernel truncation introduces a bias of order $O(H/n)$ from the signal side, and (b) the noise contributes a bias that decreases as more lags are included. The variance of $RK_n$ is of order $O(H/n)$. Balancing these terms determines the optimal $H$.

*Part (iii): Optimal bandwidth for the Parzen kernel*

The Parzen kernel is:

$k(x) = \begin{cases} 1 - 6x^2 + 6|x|^3 & \text{if } |x| \leq 1/2 \\ 2(1 - |x|)^3 & \text{if } 1/2 < |x| \leq 1 \\ 0 & \text{if } |x| > 1 \end{cases}$

This kernel is second-order (flat top), non-negative definite, and has the smoothness properties needed for optimal convergence. The Parzen kernel achieves the best possible convergence rate among flat-top kernels.

The MSE-optimal bandwidth is:

$H^{*} = c \cdot n^{3/5}$

where $c$ is a constant depending on $\omega^2$, the integrated quarticity, and kernel constants. With this choice, the realized kernel achieves the convergence rate:

$RK_n - IV = O_p(n^{-1/5})$

This is slower than the $O_p(n^{-1/2})$ rate achievable in the no-noise case, reflecting the fundamental cost of microstructure noise.

Why RK is robust but naive RV is not:

Naive realized variance uses only the zero-lag autocovariance:

$RV_n = \sum_{j=1}^{n} r_j^2 = \hat{\gamma}_0$

With noise, the return is $r_j = (X_{t_j} - X_{t_{j-1}}) + (\epsilon_{t_j} - \epsilon_{t_{j-1}})$. The noise adds n\omega^2$ to $RV_n$ on average, so:

$E[RV_n] = IV + 2n\omega^2 \to \infty \quad \text{as } n \to \infty$

The realized kernel fixes this because the noise is i.i.d., so its autocovariances at lags $|h| \geq 2$ are zero, and the lag-1 autocovariance is $-\omega^2$. By including the first-lag autocovariance with appropriate weight, the kernel estimator subtracts off the noise contribution. The Parzen kernel's smooth decay ensures this cancellation happens gradually, preventing the estimator from becoming negative or erratic.

Answer: The realized kernel $RK_n = \sum_{h=-H}^{H} k(h/(H+1))\hat{\gamma}_h$ is consistent for $IV$ when $H \to \infty$ and $H/n \to 0$. For the Parzen kernel, the optimal bandwidth is $H^{*} \sim c \cdot n^{3/5}$, yielding a convergence rate of $n^{-1/5}$. The key to noise robustness is that lagged autocovariances cancel the n\omega^2$ bias that contaminates naive realized variance.

Intuition

The realized kernel is one of the most important tools in high-frequency econometrics, and the intuition behind it is surprisingly simple. Naive realized variance fails at high frequency because market microstructure noise (bid-ask bounce, price discreteness, latency) adds spurious variation that grows linearly with the number of observations. The genius of the realized kernel is recognizing that the noise is essentially uncorrelated across time, so its fingerprint shows up only at lag 0 and lag 1 in the autocovariance function. By constructing a weighted sum of autocovariances -- giving full weight at lag 0 and smoothly decaying weights at higher lags -- you can subtract off exactly the noise contribution while retaining the signal.

The bandwidth $H$ controls this trade-off: too small and you do not include enough lags to cancel the noise; too large and each autocovariance estimate becomes noisy itself (and you introduce bias from the signal's own autocovariance structure). The optimal $H \sim n^{3/5}$ is a bias-variance sweet spot. In practice, this is why quant desks do not just "sample at 5-minute bars" to avoid noise -- they use kernel or pre-averaging estimators on tick data and extract far more information. The $n^{-1/5}$ convergence rate, while slower than the classical $n^{-1/2}$, is the best you can do without modeling the noise parametrically, and it is the theoretical foundation behind production volatility estimation at firms that trade at high frequency.

Open the full interactive solver →