Cointegration: Engle-Granger Procedure and Error Correction

Time Series · Medium · Free problem

Two log-price series $p_{1,t}$ and $p_{2,t}$ are suspected to be cointegrated.

  1. Describe the Engle-Granger two-step procedure for testing cointegration. What test do you apply to the residuals, and what are the critical values compared to?
  1. Suppose the estimated cointegrating residual $z_t$ satisfies the AR(1) process $z_t = \rho\, z_{t-1} + \epsilon_t$ with $|\rho| < 1$. Derive the half-life of deviations from equilibrium.
  1. Write the corresponding error-correction model (ECM) for $\Delta p_{1,t}$, and explain the role of each term.

Hints

  1. The Engle-Granger procedure has two steps: first estimate the long-run relationship by OLS, then test the residuals for stationarity. The trap is using the wrong critical values for the ADF test.
  2. For the half-life, think about the expected path of the AR(1) process: $E[z_t] = \rho^t z_0$. Set this equal to $z_0/2$ and solve for $t$.
  3. The error-correction model includes a term $\alpha z_{t-1}$ that pulls $\Delta p_{1,t}$ back toward the long-run equilibrium. The sign of $\alpha$ must be negative for error correction to work.

Worked Solution

How to Think About It: Cointegration is about two prices that wander individually (each is a random walk) but stay tethered to each other -- like two drunks connected by a rope. The spread $z_t = p_{1,t} - \beta p_{2,t}$ is mean-reverting even though each price is not. The Engle-Granger procedure is the simplest way to test for this: estimate the long-run relationship, check if the residual is stationary, then build a model that captures both the short-run dynamics and the pull back toward equilibrium. If you're a pairs trader, this is literally the foundation of your strategy.

Key Insight: The half-life of the cointegrating residual tells you how long the spread takes to close halfway -- this directly determines your holding period and whether the strategy is tradable.

The Method:

Part (i): Engle-Granger Two-Step Procedure

*Step 1 -- Estimate the cointegrating relationship.* Run OLS:

$p_{1,t} = \alpha + \beta\, p_{2,t} + z_t$

Save the residuals $\hat{z}_t = p_{1,t} - \hat{\alpha} - \hat{\beta}\, p_{2,t}$.

*Step 2 -- Test for stationarity of $\hat{z}_t$.* Apply the Augmented Dickey-Fuller (ADF) test to $\hat{z}_t$:

$\Delta \hat{z}_t = \phi\, \hat{z}_{t-1} + \sum_{j=1}^{k} \gamma_j \Delta \hat{z}_{t-j} + \eta_t$

Test $H_0: \phi = 0$ (unit root, no cointegration) vs. $H_1: \phi < 0$ (stationary residual, cointegration exists).

Critical point: You cannot use standard Dickey-Fuller critical values. Because $\hat{z}_t$ comes from a first-stage regression (not observed directly), the null distribution of the test statistic is shifted. You must use the Engle-Granger / MacKinnon critical values, which are more negative (harder to reject). Using standard ADF tables would over-reject and falsely "find" cointegration.

Part (ii): Half-Life of Deviations

Given $z_t = \rho\, z_{t-1} + \epsilon_t$ with $|\rho| < 1$, the expected path of the residual from an initial deviation $z_0$ is:

$E[z_t \mid z_0] = \rho^t z_0$

The half-life $\tau$ is the time it takes for the expected deviation to decay to half its initial value:

$\rho^{\tau} = \frac{1}{2}$

$\tau = \frac{\ln(1/2)}{\ln \rho} = \frac{-\ln 2}{\ln \rho}$

Since $|\rho| < 1$, $\ln \rho < 0$, so $\tau > 0$.

Example: If $\rho = 0.95$ (daily data), then $\tau = \ln 2 / (-\ln 0.95) \approx 0.693 / 0.0513 \approx 13.5$ days. A pairs trade on this spread would need roughly 2 weeks to see the mean-reversion play out. If $\rho = 0.99$, the half-life jumps to about 69 days -- much slower, and you'd need to hold through more noise.

Part (iii): Error-Correction Model

The ECM for $\Delta p_{1,t}$ is:

$\Delta p_{1,t} = \mu + \alpha\, z_{t-1} + \sum_{j=1}^{k} \gamma_j \Delta p_{1,t-j} + \sum_{j=1}^{k} \delta_j \Delta p_{2,t-j} + \eta_t$

where $z_{t-1} = p_{1,t-1} - \hat{\alpha} - \hat{\beta}\, p_{2,t-1}$ is the lagged cointegrating residual.

Role of each term:

  • $\alpha\, z_{t-1}$: Error-correction term. This is the pull back toward equilibrium. If $z_{t-1} > 0$ (meaning $p_1$ is "too high" relative to $p_2$), then $\alpha < 0$ causes $p_1$ to decrease. The coefficient $\alpha$ measures the speed of adjustment. This is the key term -- without it, you just have a VAR in differences that ignores the long-run relationship.
  • $\gamma_j \Delta p_{1,t-j}$: Own lagged returns. Captures short-run momentum or mean-reversion in $p_1$.
  • $\delta_j \Delta p_{2,t-j}$: Cross lagged returns. Captures short-run lead-lag effects between the two series.
  • $\mu$: Drift. Allows for a non-zero average return.

Practical Considerations:

  • The Engle-Granger procedure assumes a single cointegrating relationship. For more than two series, use the Johansen procedure, which can find multiple cointegrating vectors.
  • The first-stage OLS is super-consistent ($\hat{\beta}$ converges at rate $T$ rather than $\sqrt{T}$), but the standard errors from the first-stage regression are not valid for inference. Use the ECM for inference on dynamics.
  • In practice, always check that the half-life is in a tradable range. A half-life of 200 days means you're holding through so much noise that transaction costs and margin requirements will eat your edge.

Answer:

(i) Engle-Granger: regress $p_1$ on $p_2$ via OLS, then ADF-test the residuals using Engle-Granger critical values (not standard ADF tables).

(ii) Half-life: $\tau = -\ln 2 / \ln \rho$.

(iii) ECM: $\Delta p_{1,t} = \mu + \alpha z_{t-1} + \text{lagged differences} + \eta_t$, where $\alpha < 0$ is the speed of mean-reversion and $z_{t-1}$ is the lagged spread.

Intuition

Cointegration is the mathematical formalization of 'mean-reverting spread,' which is the foundation of statistical arbitrage. Two prices can each be random walks (unpredictable individually) but still maintain a stable long-run relationship. The error-correction model captures the dual nature of such systems: short-run deviations are possible (the lagged difference terms), but the spread is always being pulled back toward equilibrium (the error-correction term).

The half-life is the single most important number for a pairs trader. It tells you how patient you need to be. A short half-life (5-10 days) means fast mean-reversion and frequent trading opportunities with tight stop-losses. A long half-life (50+ days) means you're exposed to a lot of interim risk for a slow convergence. In practice, most quant desks filter for pairs with half-lives in the 5-30 day range -- fast enough to trade, slow enough that transaction costs don't eat the signal.

Open the full interactive solver →