Two-State Volatility HMM and the Hamilton Filter

Time Series · Hard · Free problem

Returns follow a two-state regime-switching model. In state $s_t \in \{1, 2\}$, the return is drawn from a zero-mean normal:

$r_t \mid s_t = i \;\sim\; N(0,\, \sigma_i^2)$

The hidden state $s_t$ is a Markov chain with

\times 2$ transition matrix $P$, where $P_{ij} = P(s_{t+1} = j \mid s_t = i)$.

You observe a sequence of returns $r_1, r_2, \ldots, r_T$ but never see the states directly.

  1. Derive the Hamilton filter: the recursive formula for the filtered state probabilities $\xi_{t|t}(i) = P(s_t = i \mid r_{1:t})$, starting from a prior $\xi_{1|0}$.
  1. Show how to compute the one-step predicted state probabilities $\xi_{t+1|t}(i) = P(s_{t+1} = i \mid r_{1:t})$ from the filtered probabilities.
  1. Write down the log-likelihood $\log L(\theta; r_{1:T})$ as a byproduct of the filter, where $\theta = (\sigma_1^2, \sigma_2^2, P)$.
  1. Derive the backward (smoothing) recursion for the smoothed probabilities $\xi_{t|T}(i) = P(s_t = i \mid r_{1:T})$.
  1. State the MLE first-order conditions for $(\sigma_1^2, \sigma_2^2, P)$ in terms of the smoothed probabilities.

Hints

  1. The entire filter is just Bayes' rule applied at each time step -- predict the state forward using the transition matrix, then update using the Gaussian likelihood of the observed return.
  2. The log-likelihood comes for free: the normalizing constant in the Bayes update at each step is $f(r_t \mid r_{1:t-1})$, and the log-likelihood is the sum of log predictive densities.
  3. For the MLE conditions, think of it as an EM algorithm: the smoothed probabilities $\xi_{t|T}(i)$ act as soft assignments of each observation to a regime, and the M-step formulas are just weighted averages using those soft assignments.

Worked Solution

How to Think About It: This is the workhorse model for regime-switching volatility, introduced by Hamilton (1989). The intuition is simple: the market alternates between a calm regime (low $\sigma$) and a turbulent regime (high $\sigma$), and you want to infer which regime you are in right now, given the returns you have seen. The math is just Bayes' rule applied recursively -- predict the state forward using the transition matrix, then update using the likelihood of the observed return. Everything else (log-likelihood, smoothing, MLE) falls out as bookkeeping.

The key mental model: the filter is a forward pass that answers "given what I have seen so far, what state am I probably in?" The smoother is a backward pass that revises those answers using future data. The log-likelihood is a byproduct of the forward pass.

Quick Estimate: Before diving into formulas, think about what the filter does in limiting cases. If $\sigma_1 \ll \sigma_2$ and you see a huge return, the filter should spike toward state 2 (high-vol regime). If you see many small returns in a row, the filter should drift toward state 1 (low-vol regime). The transition matrix controls how "sticky" each regime is -- high diagonal entries mean regimes persist.

Formal Derivation:

Part 1: The Hamilton filter (forward recursion)

Define the filtered probability vector $\xi_{t|t} = (\xi_{t|t}(1),\, \xi_{t|t}(2))^\top$ where $\xi_{t|t}(i) = P(s_t = i \mid r_{1:t})$.

The recursion has two steps:

*Prediction step:* Given the filtered probabilities at time $t$, predict the state at $t+1$:

$\xi_{t+1|t}(j) = \sum_{i=1}^{2} P_{ij} \, \xi_{t|t}(i)$

In vector form: $\xi_{t+1|t} = P^\top \xi_{t|t}$.

*Update step:* When return $r_{t+1}$ arrives, apply Bayes' rule. The likelihood of $r_{t+1}$ in state $j$ is:

$\eta_{t+1}(j) = \phi(r_{t+1};\, 0,\, \sigma_j^2) = \frac{1}{\sigma_j \sqrt{2\pi}} \exp\!\left(-\frac{r_{t+1}^2}{2\sigma_j^2}\right)$

The joint (unnormalized) probability of being in state $j$ and seeing $r_{t+1}$ is $\xi_{t+1|t}(j) \cdot \eta_{t+1}(j)$. Normalizing:

$\xi_{t+1|t+1}(j) = \frac{\xi_{t+1|t}(j)\, \eta_{t+1}(j)}{\sum_{k=1}^{2} \xi_{t+1|t}(k)\, \eta_{t+1}(k)}$

The denominator is the predictive density of $r_{t+1}$:

$f(r_{t+1} \mid r_{1:t}) = \sum_{k=1}^{2} \xi_{t+1|t}(k)\, \eta_{t+1}(k)$

Part 2: One-step predicted probabilities

This was already shown in the prediction step above:

$\xi_{t+1|t} = P^\top \xi_{t|t}$

Each entry $\xi_{t+1|t}(j) = \sum_i P_{ij}\, \xi_{t|t}(i)$ is a weighted average of the transition probabilities into state $j$, weighted by today's filtered beliefs.

Part 3: Log-likelihood as a filter byproduct

The predictive density $f(r_t \mid r_{1:t-1})$ drops out of the normalizing constant at each step. The log-likelihood decomposes as:

$\log L(\theta;\, r_{1:T}) = \sum_{t=1}^{T} \log f(r_t \mid r_{1:t-1})$

where

$f(r_t \mid r_{1:t-1}) = \sum_{j=1}^{2} \xi_{t|t-1}(j)\, \eta_t(j)$

This is the prediction-error decomposition. You get the log-likelihood for free as a byproduct of running the filter forward -- no separate computation needed.

Part 4: Backward (smoothing) recursion

The smoothed probabilities $\xi_{t|T}(i) = P(s_t = i \mid r_{1:T})$ use all the data, not just data up to time $t$. The Kim (1994) smoother runs backward from $t = T$ to $t = 1$:

$\xi_{t|T}(i) = \xi_{t|t}(i) \sum_{j=1}^{2} \frac{P_{ij}\, \xi_{t+1|T}(j)}{\xi_{t+1|t}(j)}$

Initialize with $\xi_{T|T}$ from the final filter step. The ratio $\xi_{t+1|T}(j) / \xi_{t+1|t}(j)$ adjusts the filtered belief at time $t$ based on what the future data tell us about state $t+1$.

Intuition: if the future data strongly suggest state $j$ at $t+1$, and the transition probability from state $i$ to state $j$ is high, then the smoother revises upward the probability of state $i$ at time $t$.

Part 5: MLE conditions

Define the smoothed joint probability:

$\xi_{t,t+1|T}(i,j) = P(s_t = i,\, s_{t+1} = j \mid r_{1:T}) = \xi_{t|t}(i) \cdot \frac{P_{ij}\, \xi_{t+1|T}(j)}{\xi_{t+1|t}(j)}$

*Variance estimates:* The MLE for each regime variance is the weighted average of squared returns, weighted by the smoothed probability of being in that regime:

$\hat{\sigma}_i^2 = \frac{\sum_{t=1}^{T} \xi_{t|T}(i)\, r_t^2}{\sum_{t=1}^{T} \xi_{t|T}(i)}$

This makes sense: each squared return contributes to the variance estimate of state $i$ in proportion to how likely it is that the system was in state $i$ at that time.

*Transition probabilities:* The MLE for $P_{ij}$ is the expected number of $i \to j$ transitions divided by the expected number of times in state $i$:

$\hat{P}_{ij} = \frac{\sum_{t=1}^{T-1} \xi_{t,t+1|T}(i,j)}{\sum_{t=1}^{T-1} \xi_{t|T}(i)}$

Note that $\sum_j \hat{P}_{ij} = 1$ automatically, so the rows of $\hat{P}$ are proper probability vectors.

*EM algorithm:* In practice, these conditions define an EM algorithm. The E-step runs the filter and smoother to compute $\xi_{t|T}$ and $\xi_{t,t+1|T}$. The M-step updates $(\sigma_1^2, \sigma_2^2, P)$ using the formulas above. Iterate until convergence.

Answer: The Hamilton filter is a two-step predict-update recursion: predict via $\xi_{t+1|t} = P^\top \xi_{t|t}$, update via Bayes' rule using the Gaussian likelihood, and read off the log-likelihood from the normalizing constants. Smoothed probabilities follow from a backward recursion that revises filtered beliefs using future data. The MLE conditions express $\hat{\sigma}_i^2$ as a smoothed-probability-weighted average of squared returns, and $\hat{P}_{ij}$ as the ratio of expected $i \to j$ transitions to expected time in state $i$. In practice, these are computed via the EM algorithm.

Intuition

The Hamilton filter is the canonical example of how Bayesian filtering works in a finite-state setting. The core idea is embarrassingly simple: you maintain a probability vector over the hidden states, predict forward using the Markov transition matrix, and then update using the likelihood of the new observation. Everything else -- the log-likelihood, the smoother, the EM algorithm -- is bookkeeping built on top of this predict-update cycle. The reason this model matters in practice is that financial volatility genuinely clusters into regimes (think 2007 vs. 2017), and the Hamilton filter gives you a principled, real-time estimate of which regime you are in.

The subtlety that trips people up is the difference between filtered and smoothed probabilities. The filter answers "what do I believe right now?" -- it is causal and can be used in real time. The smoother answers "what should I have believed, looking back with all the data?" -- it is acausal and is used for parameter estimation and historical analysis. In a trading context, you would use the filter for live regime detection and position sizing, but you would use the smoother (via EM) to calibrate the model parameters offline. Getting this distinction wrong -- for example, using smoothed probabilities for backtesting signal generation -- introduces look-ahead bias.

Open the full interactive solver →