Yule-Walker Equations, PACF, and AR Order Selection

Time Series · Hard · Free problem

Suppose $r_t$ is a covariance-stationary process that follows an $\text{AR}(p)$ model with unknown order $p$:

$r_t = \phi_1 r_{t-1} + \phi_2 r_{t-2} + \cdots + \phi_p r_{t-p} + \varepsilon_t$

where $\varepsilon_t$ is white noise with variance $\sigma^2$.

Derive the Yule-Walker equations that relate the autocovariances $\gamma(h)$ to the AR coefficients $\phi_1, \ldots, \phi_p$. Then derive the Durbin-Levinson recursion, which lets you compute the partial autocorrelation function (PACF) iteratively without solving the full system at each order.

Using the PACF, propose a consistent rule for selecting the true order $p$. Discuss how information criteria (AIC, BIC) provide an alternative or complement to PACF cutoff. What are the small-sample pitfalls of each approach?

Hints

Multiply both sides of the AR equation by $r_{t-h}$ and take expectations -- what happens to the noise term when $h \geq 1$?
The Durbin-Levinson recursion builds the order-$m$ fit from the order-$(m-1)$ fit. The key quantity is the residual autocovariance at lag $m$ not explained by the shorter model, normalized by the prediction error variance.
For a true $\text{AR}(p)$, the PACF $\phi_{m,m} = 0$ for $m > p$. Under this null, $\hat{\phi}_{m,m}$ is approximately $N(0, 1/T)$, giving you a $\pm 1.96/\sqrt{T}$ significance band.

Worked Solution

How to Think About It: The Yule-Walker equations are just what you get when you multiply both sides of the AR equation by $r_{t-h}$ and take expectations -- you are exploiting the fact that the noise $\varepsilon_t$ is uncorrelated with past values. The beauty of the Durbin-Levinson recursion is that it lets you "grow" the AR fit from order 1 to order 2 to order $m$ without re-solving the entire system each time. The last coefficient you add at each step is exactly the partial autocorrelation at that lag -- it measures how much new predictive information that lag contributes after accounting for all shorter lags. For order selection, the key signature of an $\text{AR}(p)$ process is that the PACF cuts off sharply after lag $p$.

Part 1: Deriving the Yule-Walker Equations

Start from the model:

$r_t = \sum_{j=1}^{p} \phi_j r_{t-j} + \varepsilon_t$

Multiply both sides by $r_{t-h}$ (for $h \geq 1$) and take expectations. Since $r_t$ is covariance-stationary, $E[r_t r_{t-h}] = \gamma(h)$, and since $\varepsilon_t$ is uncorrelated with $r_{t-h}$ for $h \geq 1$:

$\gamma(h) = \sum_{j=1}^{p} \phi_j \gamma(h - j), \quad h = 1, 2, \ldots, p$

In matrix form, writing $\boldsymbol{\gamma} = (\gamma(1), \ldots, \gamma(p))^\top$ and $\boldsymbol{\phi} = (\phi_1, \ldots, \phi_p)^\top$:

$\boldsymbol{\Gamma}_p \boldsymbol{\phi} = \boldsymbol{\gamma}$

where $\boldsymbol{\Gamma}_p$ is the $p \times p$ Toeplitz matrix with $(i,j)$-entry $\gamma(|i - j|)$:

$\boldsymbol{\Gamma}_p = \begin{pmatrix} \gamma(0) & \gamma(1) & \cdots & \gamma(p-1) \\ \gamma(1) & \gamma(0) & \cdots & \gamma(p-2) \\ \vdots & & \ddots & \vdots \\ \gamma(p-1) & \gamma(p-2) & \cdots & \gamma(0) \end{pmatrix}$

The solution is $\boldsymbol{\phi} = \boldsymbol{\Gamma}_p^{-1} \boldsymbol{\gamma}$. The noise variance follows from the $h = 0$ equation:

$\sigma^2 = \gamma(0) - \sum_{j=1}^{p} \phi_j \gamma(j)$

Part 1 (continued): The Durbin-Levinson Recursion

The Durbin-Levinson algorithm avoids inverting $\boldsymbol{\Gamma}_m$ from scratch at each order $m$. Let $\phi_{m,j}$ denote the $j$-th AR coefficient in the best linear predictor of order $m$, and let $v_m$ be the corresponding prediction error variance.

Initialize: $v_0 = \gamma(0)$.

For $m = 1, 2, 3, \ldots$:

Compute the $m$-th partial autocorrelation:

$\phi_{m,m} = \frac{\gamma(m) - \sum_{j=1}^{m-1} \phi_{m-1,j}\, \gamma(m - j)}{v_{m-1}}$

Update the intermediate coefficients for $j = 1, \ldots, m-1$:

$\phi_{m,j} = \phi_{m-1,j} - \phi_{m,m}\, \phi_{m-1,m-j}$

Update the prediction error variance:

$v_m = v_{m-1}(1 - \phi_{m,m}^2)$

The quantity $\phi_{m,m}$ is the partial autocorrelation at lag $m$ -- it is the correlation between $r_t$ and $r_{t-m}$ after removing the linear effect of $r_{t-1}, \ldots, r_{t-m+1}$. For a true $\text{AR}(p)$ process, $\phi_{m,m} = 0$ for all $m > p$.

Why it works: At each step, the numerator of $\phi_{m,m}$ computes the residual autocovariance at lag $m$ that is not explained by the order-$(m-1)$ fit. Dividing by $v_{m-1}$ normalizes it into a correlation. The update formula for $\phi_{m,j}$ follows from the block matrix identity relating $\boldsymbol{\Gamma}_m^{-1}$ to $\boldsymbol{\Gamma}_{m-1}^{-1}$ -- it is a Schur complement argument on the Toeplitz structure.

Part 2: Order Selection

PACF cutoff rule: For a true $\text{AR}(p)$ process, the theoretical PACF is exactly zero for lags

Loading problems...

Yule-Walker Equations, PACF, and AR Order Selection

Hints

Worked Solution

Intuition