Rauch-Tung-Striebel Smoother in a Local-Level Model

Stochastic Processes · Hard · Free problem

Consider the simplest Gaussian state-space model, the local-level (random walk plus noise) model:

$x_t = x_{t-1} + w_t, \qquad y_t = x_t + v_t$

where $w_t \sim N(0, Q)$ and $v_t \sim N(0, R)$ are independent noise sequences, and you observe $y_1, y_2, \ldots, y_T$.

Write out the Kalman filter forward recursions: the prediction step (prior mean and variance at time $t$ given data through $t-1$) and the update step (posterior mean and variance at time $t$ given data through $t$). Identify the Kalman gain $K_t$.

Derive the Rauch-Tung-Striebel (RTS) backward smoothing pass. Starting from the final filtered estimate, show how to compute the smoothed mean $x_{t|T}$ and smoothed variance $P_{t|T}$ for each $t = T-1, T-2, \ldots, 1$. Identify the smoother gain $L_t$.

In steady state ($t \to \infty$), the filtered variance $P_{t|t}$ converges to a constant $P$. Find the steady-state Kalman gain $K$ and steady-state smoother gain $L$ in closed form as functions of $Q$ and $R$.

Hints

Start with the standard Bayesian update for Gaussians: combining a prior $N(\mu_1, \sigma_1^2)$ with a likelihood $N(\mu_2, \sigma_2^2)$ gives a posterior whose precision is the sum of precisions. The Kalman filter is just this applied sequentially.
For the RTS backward pass, use the joint distribution of $(x_t, x_{t+1})$ conditional on $y_{1:t}$. Since $x_{t+1} = x_t + w_{t+1}$, the covariance between $x_t$ and $x_{t+1}$ given $y_{1:t}$ is simply $P_{t|t}$. Apply the Gaussian conditioning formula.
For steady state, set $P_{t|t} = P_{t-1|t-1} = P$ and substitute into the Riccati recursion. You will get a quadratic in $\Pi = P + Q$ of the form $\Pi^2 - Q\Pi - QR = 0$.

Worked Solution

How to Think About It: The Kalman filter is sequential Bayesian updating for linear-Gaussian models. At each step you have a Gaussian prior (the prediction), you see a noisy observation, and you compute a Gaussian posterior (the update). The RTS smoother then makes a backward pass: once you have seen ALL the data, you go back and revise each estimate using future information. Think of filtering as "best guess using data so far" and smoothing as "best guess using all data." The smoother always reduces variance -- future data can only help.

For the steady-state gains, the key is that the filtered variance satisfies a Riccati equation. In the scalar case it reduces to a quadratic you can solve explicitly.

Quick Estimate: Before diving in, note the signal-to-noise ratio $q = Q/R$ controls everything. When $q$ is small (state barely moves relative to observation noise), the Kalman gain $K$ should be small -- you trust the prediction more than the observation. The steady-state filtered variance should be between 0 and $R$ (you can never do worse than just using the observation). For $Q = 1, R = 4$ (so $q = 0.25$), we expect $K$ to be modest. Let us check after the derivation.

Formal Solution:

Part 1: Kalman Filter Forward Pass

Let $x_{t|s} = E[x_t | y_1, \ldots, y_s]$ and $P_{t|s} = \text{Var}(x_t | y_1, \ldots, y_s)$.

*Prediction step:* Given the filtered estimate at $t-1$:

$x_{t|t-1} = x_{t-1|t-1}$

$P_{t|t-1} = P_{t-1|t-1} + Q$

This follows directly from the state equation $x_t = x_{t-1} + w_t$ -- the conditional mean is unchanged (the noise has mean zero) and the variance increases by $Q$.

*Update step:* When observation $y_t$ arrives, the innovation is $e_t = y_t - x_{t|t-1}$ with variance $S_t = P_{t|t-1} + R$. The Kalman gain is:

$K_t = \frac{P_{t|t-1}}{P_{t|t-1} + R}$

The filtered estimates are:

$x_{t|t} = x_{t|t-1} + K_t (y_t - x_{t|t-1})$

$P_{t|t} = (1 - K_t) P_{t|t-1} = \frac{P_{t|t-1} \, R}{P_{t|t-1} + R}$

Note $P_{t|t}$ is the harmonic-mean-like combination of prior variance and observation variance -- exactly what you get from combining two Gaussian estimates.

Part 2: RTS Backward Smoothing Pass

The smoother starts at $t = T$ with $x_{T|T}$ and $P_{T|T}$ from the filter, then works backward. The key identity comes from the joint Gaussian distribution of $(x_t, x_{t+1})$ given $y_1, \ldots, y_t$:

$x_t | y_{1:t} \sim N(x_{t|t}, P_{t|t})$
$x_{t+1} | y_{1:t} \sim N(x_{t+1|t}, P_{t+1|t})$
$\text{Cov}(x_t, x_{t+1} | y_{1:t}) = P_{t|t}$ (since $x_{t+1} = x_t + w_{t+1}$ and $w_{t+1}$ is independent)

Using the standard Gaussian conditioning formula for $(x_t | x_{t+1}, y_{1:t})$ and then incorporating future data $y_{t+1:T}$ through $x_{t+1|T}$, the smoother gain is:

$L_t = \frac{P_{t|t}}{P_{t+1|t}}$

The smoothed estimates are:

$x_{t|T} = x_{t|t} + L_t (x_{t+1|T} - x_{t+1|t})$

$P_{t|T} = P_{t|t} + L_t^2 (P_{t+1|T} - P_{t+1|t})$

The intuition is clean: the smoother adjusts the filtered estimate by an amount proportional to how much the next time step's estimate changed when we incorporated future data. Since $P_{t+1|T} \leq P_{t+1|t}$ (smoothing always reduces variance), we get $P_{t|T} \leq P_{t|t}$ -- future data helps.

Part 3: Steady-State Gains

In steady state, $P_{t|t} \to P$ and $P_{t+1|t} \to P + Q$. The Kalman gain becomes constant:

$K = \frac{P + Q}{P + Q + R}$

and the filtered variance update $P = (1 - K)(P + Q)$ gives:

$P = \frac{(P + Q) R}{P + Q + R}$

Let $\Pi = P + Q$ (the steady-state predicted variance). Then $P = \Pi R / (\Pi + R)$ and $\Pi = P + Q = \Pi R/(\Pi + R) + Q$. Multiplying through by $(\Pi + R)$:

$\Pi(\Pi + R) = \Pi R + Q(\Pi + R)$

$\Pi^2 = Q \Pi + Q R$

$\Pi^2 - Q \Pi - Q R = 0$

By the quadratic formula (taking the positive root):

$\Pi = \frac{Q + \sqrt{Q^2 + 4QR}}{2}$

Then:

$K = \frac{\Pi}{\Pi + R} = \frac{Q + \sqrt{Q^2 + 4QR}}{Q + 2R + \sqrt{Q^2 + 4QR}}$

$P = \Pi - Q = \frac{-Q + \sqrt{Q^2 + 4QR}}{2}$

The steady-state smoother gain is:

$L = \frac{P}{P + Q} = \frac{P}{\Pi} = \frac{-Q + \sqrt{Q^2 + 4QR}}{Q + \sqrt{Q^2 + 4QR}}$

*Sanity check with $Q = 1, R = 4$:* We get $\Pi = (1 + \sqrt{1 + 16})/2 = (1 + \sqrt{17})/2 \approx 2.56$. Then $K = 2.56/6.56 \approx 0.39$, $P \approx 1.56$, and $L = 1.56/2.56 \approx 0.61$. The Kalman gain is 0.39 -- you weight the observation about 39% vs. the prediction 61%, which makes sense for a moderate signal-to-noise ratio. The smoother gain 0.61 says the backward pass puts substantial weight on future information.

Answer: The Kalman filter alternates prediction ($P_{t|t-1} = P_{t-1|t-1} + Q$) and update ($K_t = P_{t|t-1}/(P_{t|t-1} + R)$, $P_{t|t} = (1-K_t)P_{t|t-1}$). The RTS smoother runs backward with gain $L_t = P_{t|t}/P_{t+1|t}$ to compute $x_{t|T} = x_{t|t} + L_t(x_{t+1|T} - x_{t+1|t})$. In steady state, the predicted variance satisfies $\Pi^2 - Q\Pi - QR = 0$, giving $\Pi = (Q + \sqrt{Q^2 + 4QR})/2$, from which $K = \Pi/(\Pi + R)$ and $L = (\Pi - Q)/\Pi$.

Intuition

The Kalman filter and RTS smoother together illustrate one of the deepest ideas in estimation: the value of future information. The filter gives you the best estimate using only past and current data, but when you have a complete dataset, you can do better by also using future observations. The RTS smoother quantifies exactly how much better. The smoother gain $L_t = P_{t|t}/P_{t+1|t}$ measures how tightly coupled adjacent states are relative to prediction uncertainty -- when states are highly persistent (small $Q$ relative to $P$), $L$ is large and future data has a big impact on past estimates.

In practice, this shows up constantly in quant work: signal extraction from noisy time series, estimating latent factors, and computing smoothed volatility paths. The steady-state result is particularly useful because it tells you the long-run information content of your observations as a function of just two numbers, the state noise $Q$ and observation noise $R$. A common mistake is to use only the filtered estimates when you have the full sample available -- the smoother is strictly better and often substantially so, especially when the signal-to-noise ratio is low.

Open the full interactive solver →