Worked Solution
How to Think About It: This is the workhorse model for regime-switching volatility, introduced by Hamilton (1989). The intuition is simple: the market alternates between a calm regime (low $\sigma$) and a turbulent regime (high $\sigma$), and you want to infer which regime you are in right now, given the returns you have seen. The math is just Bayes' rule applied recursively -- predict the state forward using the transition matrix, then update using the likelihood of the observed return. Everything else (log-likelihood, smoothing, MLE) falls out as bookkeeping.
The key mental model: the filter is a forward pass that answers "given what I have seen so far, what state am I probably in?" The smoother is a backward pass that revises those answers using future data. The log-likelihood is a byproduct of the forward pass.
Quick Estimate: Before diving into formulas, think about what the filter does in limiting cases. If $\sigma_1 \ll \sigma_2$ and you see a huge return, the filter should spike toward state 2 (high-vol regime). If you see many small returns in a row, the filter should drift toward state 1 (low-vol regime). The transition matrix controls how "sticky" each regime is -- high diagonal entries mean regimes persist.
Formal Derivation:
Part 1: The Hamilton filter (forward recursion)
Define the filtered probability vector $\xi_{t|t} = (\xi_{t|t}(1),\, \xi_{t|t}(2))^\top$ where $\xi_{t|t}(i) = P(s_t = i \mid r_{1:t})$.
The recursion has two steps:
*Prediction step:* Given the filtered probabilities at time $t$, predict the state at $t+1$:
$\xi_{t+1|t}(j) = \sum_{i=1}^{2} P_{ij} \, \xi_{t|t}(i)$
In vector form: $\xi_{t+1|t} = P^\top \xi_{t|t}$.
*Update step:* When return $r_{t+1}$ arrives, apply Bayes' rule. The likelihood of $r_{t+1}$ in state $j$ is:
$\eta_{t+1}(j) = \phi(r_{t+1};\, 0,\, \sigma_j^2) = \frac{1}{\sigma_j \sqrt{2\pi}} \exp\!\left(-\frac{r_{t+1}^2}{2\sigma_j^2}\right)$
The joint (unnormalized) probability of being in state $j$ and seeing $r_{t+1}$ is $\xi_{t+1|t}(j) \cdot \eta_{t+1}(j)$. Normalizing:
$\xi_{t+1|t+1}(j) = \frac{\xi_{t+1|t}(j)\, \eta_{t+1}(j)}{\sum_{k=1}^{2} \xi_{t+1|t}(k)\, \eta_{t+1}(k)}$
The denominator is the predictive density of $r_{t+1}$:
$f(r_{t+1} \mid r_{1:t}) = \sum_{k=1}^{2} \xi_{t+1|t}(k)\, \eta_{t+1}(k)$
Part 2: One-step predicted probabilities
This was already shown in the prediction step above:
$\xi_{t+1|t} = P^\top \xi_{t|t}$
Each entry $\xi_{t+1|t}(j) = \sum_i P_{ij}\, \xi_{t|t}(i)$ is a weighted average of the transition probabilities into state $j$, weighted by today's filtered beliefs.
Part 3: Log-likelihood as a filter byproduct
The predictive density $f(r_t \mid r_{1:t-1})$ drops out of the normalizing constant at each step. The log-likelihood decomposes as:
$\log L(\theta;\, r_{1:T}) = \sum_{t=1}^{T} \log f(r_t \mid r_{1:t-1})$
where
$f(r_t \mid r_{1:t-1}) = \sum_{j=1}^{2} \xi_{t|t-1}(j)\, \eta_t(j)$
This is the prediction-error decomposition. You get the log-likelihood for free as a byproduct of running the filter forward -- no separate computation needed.
Part 4: Backward (smoothing) recursion
The smoothed probabilities $\xi_{t|T}(i) = P(s_t = i \mid r_{1:T})$ use all the data, not just data up to time $t$. The Kim (1994) smoother runs backward from $t = T$ to $t = 1$:
$\xi_{t|T}(i) = \xi_{t|t}(i) \sum_{j=1}^{2} \frac{P_{ij}\, \xi_{t+1|T}(j)}{\xi_{t+1|t}(j)}$
Initialize with $\xi_{T|T}$ from the final filter step. The ratio $\xi_{t+1|T}(j) / \xi_{t+1|t}(j)$ adjusts the filtered belief at time $t$ based on what the future data tell us about state $t+1$.
Intuition: if the future data strongly suggest state $j$ at $t+1$, and the transition probability from state $i$ to state $j$ is high, then the smoother revises upward the probability of state $i$ at time $t$.
Part 5: MLE conditions
Define the smoothed joint probability:
$\xi_{t,t+1|T}(i,j) = P(s_t = i,\, s_{t+1} = j \mid r_{1:T}) = \xi_{t|t}(i) \cdot \frac{P_{ij}\, \xi_{t+1|T}(j)}{\xi_{t+1|t}(j)}$
*Variance estimates:* The MLE for each regime variance is the weighted average of squared returns, weighted by the smoothed probability of being in that regime:
$\hat{\sigma}_i^2 = \frac{\sum_{t=1}^{T} \xi_{t|T}(i)\, r_t^2}{\sum_{t=1}^{T} \xi_{t|T}(i)}$
This makes sense: each squared return contributes to the variance estimate of state $i$ in proportion to how likely it is that the system was in state $i$ at that time.
*Transition probabilities:* The MLE for $P_{ij}$ is the expected number of $i \to j$ transitions divided by the expected number of times in state $i$:
$\hat{P}_{ij} = \frac{\sum_{t=1}^{T-1} \xi_{t,t+1|T}(i,j)}{\sum_{t=1}^{T-1} \xi_{t|T}(i)}$
Note that $\sum_j \hat{P}_{ij} = 1$ automatically, so the rows of $\hat{P}$ are proper probability vectors.
*EM algorithm:* In practice, these conditions define an EM algorithm. The E-step runs the filter and smoother to compute $\xi_{t|T}$ and $\xi_{t,t+1|T}$. The M-step updates $(\sigma_1^2, \sigma_2^2, P)$ using the formulas above. Iterate until convergence.
Answer: The Hamilton filter is a two-step predict-update recursion: predict via $\xi_{t+1|t} = P^\top \xi_{t|t}$, update via Bayes' rule using the Gaussian likelihood, and read off the log-likelihood from the normalizing constants. Smoothed probabilities follow from a backward recursion that revises filtered beliefs using future data. The MLE conditions express $\hat{\sigma}_i^2$ as a smoothed-probability-weighted average of squared returns, and $\hat{P}_{ij}$ as the ratio of expected $i \to j$ transitions to expected time in state $i$. In practice, these are computed via the EM algorithm.