Online Covariance and Rolling Correlation

Linear Algebra · Hard · Free problem

You are receiving a stream of $p$-dimensional vectors $x_1, x_2, x_3, \ldots$ one at a time. You need to maintain running statistics in a single pass.

  1. Give numerically stable, one-pass recursions (Welford/Chan style) to update the mean vector $\mu_t$ and covariance matrix $\Sigma_t$ as each new $x_t$ arrives. Your update should be $O(p^2)$ per step.
  1. Extend this to a rolling-window covariance $\Sigma_t(W)$ that uses only the most recent $W$ observations. Explain how to handle the removal of the oldest observation.
  1. Provide an exponentially weighted moving average (EWMA) version with decay factor $\lambda \in (0, 1)$, so that older observations receive geometrically decreasing weight. Give the recursive update formulas.

Hints

  1. The key to numerical stability is tracking deviations from the running mean, not raw sums of squares. Look up Welford's online algorithm.
  2. For the rolling window, you need to "undo" the oldest observation. Think of it as the reverse of the Welford update -- subtract the old point's contribution to $C$ using deviations from the adjusted mean.
  3. For EWMA, the recursion is $\mu_t = \lambda \mu_{t-1} + (1-\lambda)x_t$ and the covariance uses two deviations: one from the old mean, one from the new. This mirrors the Welford trick in the exponentially weighted setting.

Worked Solution

How to Think About It: The naive approach to computing covariance is to store all data and recompute from scratch each time. That is $O(tp^2)$ per step and requires unbounded memory -- both unacceptable in a live trading system processing thousands of ticks per second. The fix is to maintain a small set of sufficient statistics that can be updated incrementally. Welford's algorithm (1962) is the gold standard because it avoids the catastrophic cancellation that plagues the textbook formula $\text{Var}(X) = E[X^2] - (E[X])^2$ when the mean is large relative to the standard deviation.

Key Insight: Instead of tracking $\sum x_i$ and $\sum x_i x_i^T$ separately (which causes numerical instability), track the sum of squared deviations from the current running mean. This quantity can be updated in $O(p^2)$ per step with a simple recursion.

The Method:

Part 1: Online mean and covariance (Welford/Chan)

Maintain two quantities: - $\mu_t$: the running mean after $t$ observations - $C_t$: the sum of outer products of deviations from the running mean, so $\Sigma_t = C_t / (t-1)$

Initialize: $\mu_0 = 0$, $C_0 = 0$ (the $p \times p$ zero matrix).

When $x_t$ arrives:

$\delta_t = x_t - \mu_{t-1}$

$\mu_t = \mu_{t-1} + \frac{1}{t} \delta_t$

$\delta_t' = x_t - \mu_t$

$C_t = C_{t-1} + \delta_t \, \delta_t'^T$

The covariance matrix estimate is $\Sigma_t = C_t / (t - 1)$ (unbiased) or $C_t / t$ (MLE).

  • Why this is stable: $\delta_t$ and $\delta_t'$ are deviations from the current mean, so they stay small. The naive formula accumulates $\sum x_i x_i^T$ which can be enormous when the mean is far from zero, leading to catastrophic cancellation when you subtract $t \mu_t \mu_t^T$.
  • Cost: $O(p^2)$ per step for the outer product. $O(p^2)$ storage for $C_t$ plus $O(p)$ for $\mu_t$.

Part 2: Rolling-window covariance $\Sigma_t(W)$

Maintain a circular buffer of the last $W$ observations. When the window is full and a new $x_t$ arrives, you must both add $x_t$ and remove $x_{t-W}$.

Use Chan's parallel/merge formula. If you have the statistics $(t_A, \mu_A, C_A)$ for a batch $A$ and $(t_B, \mu_B, C_B)$ for batch $B$, the combined statistics are:

$t_{AB} = t_A + t_B$ $\delta = \mu_B - \mu_A$ $\mu_{AB} = \mu_A + \frac{t_B}{t_{AB}} \delta$ $C_{AB} = C_A + C_B + \frac{t_A t_B}{t_{AB}} \delta \, \delta^T$

For the rolling window, a practical approach:

  • Maintain $(W, \mu_t^{(W)}, C_t^{(W)})$ for the current window.
  • When $x_t$ arrives and $x_{t-W}$ leaves, first "subtract" $x_{t-W}$ (reverse Welford), then "add" $x_t$ (forward Welford).

Reverse update to remove $x_{t-W}$ from a window of size $n$:

$\delta = x_{t-W} - \mu_{\text{old}}$ $\mu_{\text{new}} = \mu_{\text{old}} + \frac{\delta}{n - 1} \cdot (-1) \cdot \frac{n-1}{n-1}$

More precisely: $\mu_{\text{new}} = \frac{n \cdot \mu_{\text{old}} - x_{t-W}}{n - 1}$ $\delta' = x_{t-W} - \mu_{\text{new}}$ $C_{\text{new}} = C_{\text{old}} - \delta \, \delta'^T$

Then add $x_t$ to the window of size $n-1$ using the standard forward Welford update. This keeps the operation $O(p^2)$ per step. The circular buffer costs $O(Wp)$ memory.

Caveat: Repeated add/remove operations can accumulate floating-point drift over long runs. A common practice is to periodically recompute from the buffer (e.g., every $W$ steps) to reset the error.

Part 3: Exponentially weighted (EWMA) version

With decay factor $\lambda \in (0, 1)$, we want recent observations to have weight $\lambda^0 = 1$ (most recent), $\lambda^1$, $\lambda^2$, etc.

Recursive update for the EWMA mean:

$\mu_t = \lambda \, \mu_{t-1} + (1 - \lambda) \, x_t$

Recursive update for the EWMA covariance:

$\Sigma_t = \lambda \, \Sigma_{t-1} + (1 - \lambda)(x_t - \mu_t)(x_t - \mu_t)^T$

Or equivalently, using the previous mean for the deviation (which is sometimes preferred for symmetry):

$\Sigma_t = \lambda \, \Sigma_{t-1} + (1 - \lambda)(x_t - \mu_{t-1})(x_t - \mu_t)^T$

This second form is the EWMA analogue of Welford's trick: the two deviation vectors use the old and new means, respectively, which provides better numerical stability.

  • Cost: $O(p^2)$ per step, $O(p^2)$ storage. No buffer needed since old observations decay automatically.
  • Effective window: The EWMA has an effective window of approximately
    /(1-\lambda)$ observations. Setting $\lambda = 1 - 1/W$ gives roughly the same emphasis as a flat window of size $W$.
  • Initialization: A common approach is to use the first $W_0$ observations to compute the initial $\mu_0$ and $\Sigma_0$, then switch to the EWMA recursion.

Practical Considerations:

  • Correlation from covariance: Given $\Sigma_t$, the correlation matrix is $R_t = D_t^{-1} \Sigma_t D_t^{-1}$ where $D_t = \text{diag}(\sqrt{\sigma_{11}}, \ldots, \sqrt{\sigma_{pp}})$. This is $O(p^2)$.
  • Numerical stability: The EWMA form is inherently stable because old errors decay. The rolling-window form can drift and benefits from periodic recomputation.
  • Choice of $\lambda$: In practice, RiskMetrics uses $\lambda = 0.94$ for daily data ($\approx 17$-day half-life) and $\lambda = 0.97$ for monthly data.

Answer: The Welford update for online covariance is: compute $\delta_t = x_t - \mu_{t-1}$, update $\mu_t = \mu_{t-1} + \delta_t/t$, compute $\delta_t' = x_t - \mu_t$, update $C_t = C_{t-1} + \delta_t \delta_t'^T$. Rolling-window uses a circular buffer with reverse-Welford removal. EWMA uses $\mu_t = \lambda\mu_{t-1} + (1-\lambda)x_t$ and $\Sigma_t = \lambda\Sigma_{t-1} + (1-\lambda)(x_t - \mu_{t-1})(x_t - \mu_t)^T$. All updates are $O(p^2)$ per step.

Intuition

The fundamental insight behind Welford's algorithm is that computing variance as $E[X^2] - (E[X])^2$ is mathematically correct but numerically disastrous. When the mean is large (say, stock prices around $\$3000$), $E[X^2]$ and $(E[X])^2$ are both huge, and their difference (the variance) is tiny by comparison. Subtracting two large nearly-equal numbers destroys significant digits. Welford sidesteps this by accumulating $\sum(x_i - \bar{x})^2$ directly, where the deviations are small.

In practice, every real-time trading system uses some variant of these algorithms. The EWMA version is particularly popular because it adapts to regime changes (recent data gets more weight) and requires no buffer. The RiskMetrics model, which underpins billions of dollars of risk management, is essentially just the EWMA covariance update applied to asset returns. Understanding these recursions is not just an interview topic -- it is core infrastructure for any quantitative trading operation.

Open the full interactive solver →