GLS and Cochrane-Orcutt for AR(1) Errors

Regression · Hard · Free problem

You are fitting a linear regression $y_t = x_t' \beta + u_t$ where the errors follow an AR(1) process:

$u_t = \rho \, u_{t-1} + \varepsilon_t, \quad |\rho| < 1, \quad \varepsilon_t \sim N(0, \sigma^2)$

OLS ignores the serial correlation in $u_t$, so the standard errors are wrong and the estimator is inefficient.

Assuming $\rho$ is known, derive the GLS estimator $\hat{\beta}_{\text{GLS}}$. Write out the transformed regression explicitly.

In practice $\rho$ is unknown. Describe the Cochrane-Orcutt feasible GLS (FGLS) procedure for jointly estimating $\rho$ and $\beta$. State the asymptotic properties of the resulting estimator.

Suppose you suspect the AR(1) specification for the errors might be wrong. Outline how you would use the Newey-West HAC estimator as a robust alternative, and explain when you would prefer it over FGLS.

Hints

Think about what transformation would make the AR(1) errors i.i.d. -- what operation removes the serial correlation from $u_t = \rho u_{t-1} + \varepsilon_t$?
Quasi-differencing is the key: subtracting $\rho$ times the lagged observation gives $y_t - \rho y_{t-1} = (x_t - \rho x_{t-1})' \beta + \varepsilon_t$. Don't forget the first observation needs special treatment.
For Cochrane-Orcutt, start with OLS residuals and estimate $\hat{\rho}$ by regressing $\hat{u}_t$ on $\hat{u}_{t-1}$, then iterate. For Newey-West, the idea is to leave the estimator as OLS but fix the covariance matrix using a kernel-weighted sum of autocovariances.

Worked Solution

How to Think About It: The core issue is that OLS assumes $\text{Cov}(u) = \sigma^2 I$, but AR(1) errors produce a specific correlation structure -- neighboring residuals are correlated with magnitude $\rho$, and the correlation decays geometrically as observations get farther apart. GLS exploits this structure: if you know $\rho$, you can "quasi-difference" the data to whiten the errors. This is the key move -- transform the regression so the errors become i.i.d., then run OLS on the transformed data. In practice you don't know $\rho$, so you iterate: estimate $\beta$ by OLS, use the residuals to estimate $\rho$, quasi-difference, re-estimate, repeat. That is Cochrane-Orcutt. If you are worried the errors are not actually AR(1) -- maybe there is heteroskedasticity or higher-order autocorrelation -- then Newey-West is the fallback: it does not try to model the error structure, it just corrects the standard errors to be robust.

Formal Solution:

Part (i): GLS with known $\rho$

The AR(1) error covariance matrix is:

$\Sigma = \frac{\sigma^2}{1 - \rho^2} \begin{pmatrix} 1 & \rho & \rho^2 & \cdots \\ \rho & 1 & \rho & \cdots \\ \rho^2 & \rho & 1 & \cdots \\ \vdots & & & \ddots \end{pmatrix}$

The GLS estimator is $\hat{\beta}_{\text{GLS}} = (X' \Sigma^{-1} X)^{-1} X' \Sigma^{-1} y$. But you do not need to invert $\Sigma$ directly. The trick is to quasi-difference. Define the transformed variables for $t \geq 2$:

$\tilde{y}_t = y_t - \rho \, y_{t-1}, \quad \tilde{x}_t = x_t - \rho \, x_{t-1}$

Then $\tilde{y}_t = \tilde{x}_t' \beta + \varepsilon_t$, where $\varepsilon_t$ is i.i.d. $N(0, \sigma^2)$. For the first observation, the Prais-Winsten correction uses:

$\tilde{y}_1 = \sqrt{1 - \rho^2} \, y_1, \quad \tilde{x}_1 = \sqrt{1 - \rho^2} \, x_1$

OLS on the transformed data $(\tilde{y}, \tilde{X})$ gives the GLS estimator:

$\hat{\beta}_{\text{GLS}} = (\tilde{X}' \tilde{X})^{-1} \tilde{X}' \tilde{y}$

This is BLUE (best linear unbiased estimator) under the stated model by the Gauss-Markov theorem applied to the transformed regression.

Part (ii): Cochrane-Orcutt FGLS

When $\rho$ is unknown, estimate it iteratively:

Run OLS of $y$ on $X$ to obtain residuals $\hat{u}_t$.
Estimate $\rho$ by regressing $\hat{u}_t$ on $\hat{u}_{t-1}$: $\hat{\rho} = \sum_{t=2}^{T} \hat{u}_t \hat{u}_{t-1} \Big/ \sum_{t=2}^{T} \hat{u}_{t-1}^2$.
Quasi-difference the data using $\hat{\rho}$: $\tilde{y}_t = y_t - \hat{\rho} \, y_{t-1}$, $\tilde{x}_t = x_t - \hat{\rho} \, x_{t-1}$.
Run OLS on the transformed data to get $\hat{\beta}_{\text{FGLS}}$.
Repeat steps 1-4 using the new $\hat{\beta}$ until convergence (i.e., $\hat{\rho}$ stabilizes).

Asymptotic properties: - $\hat{\rho} \xrightarrow{p} \rho$ (consistent). - $\hat{\beta}_{\text{FGLS}}$ is asymptotically equivalent to the infeasible GLS estimator. In particular, $\sqrt{T}(\hat{\beta}_{\text{FGLS}} - \beta) \xrightarrow{d} N(0, V_{\text{GLS}})$, the same asymptotic distribution as if $\rho$ were known. - The key result: estimating $\rho$ consistently does not affect the asymptotic efficiency of $\hat{\beta}$. The estimation error in $\hat{\rho}$ is $O_p(T^{-1/2})$ and washes out.

Part (iii): Newey-West robust alternative

If the AR(1) assumption is suspect -- perhaps there is heteroskedasticity, or the autocorrelation structure is more complex -- use the Newey-West HAC (heteroskedasticity and autocorrelation consistent) estimator:

Run OLS to get $\hat{\beta}_{\text{OLS}}$ and residuals $\hat{u}_t$.
Compute the HAC covariance matrix:

$\hat{V}_{\text{NW}} = (X'X)^{-1} \hat{S} (X'X)^{-1}$

where $\hat{S} = \hat{\Gamma}_0 + \sum_{j=1}^{m} w_j (\hat{\Gamma}_j + \hat{\Gamma}_j')$ with Bartlett kernel weights $w_j = 1 - j/(m+1)$ and $\hat{\Gamma}_j = \sum_{t=j+1}^{T} \hat{u}_t \hat{u}_{t-j} x_t x_{t-j}'$.

The bandwidth $m$ is typically chosen as $m \approx \lfloor T^{1/3} \rfloor$ or via data-driven rules (Andrews, Newey-West automatic).

When to prefer Newey-West over FGLS: - When the error structure is unknown or suspected to be more complex than AR(1). - When you care about valid inference (correct standard errors and test sizes) more than efficiency. - Newey-West does not improve efficiency -- OLS is still less efficient than true GLS -- but it gives you honest standard errors without needing to correctly specify the error model.

Answer: The GLS estimator quasi-differences the data using $\rho$ to whiten AR(1) errors, then applies OLS to the transformed regression. Cochrane-Orcutt iterates between estimating $\rho$ from residuals and re-estimating $\beta$ via quasi-differencing, producing an estimator that is asymptotically as efficient as infeasible GLS. When the AR(1) specification is suspect, Newey-West HAC standard errors provide robust inference without modeling the error structure, at the cost of giving up the efficiency gains from GLS.

Intuition

Serial correlation in regression errors is one of the most common problems in time series econometrics, and how you handle it reveals a lot about your modeling philosophy. GLS says: if you know the error structure, exploit it for efficiency. You literally transform the data to undo the correlation, then apply OLS to the clean version. Cochrane-Orcutt makes this practical by iterating between estimating the correlation parameter and re-transforming. The deep result is that plugging in a consistent estimate of $\rho$ costs you nothing asymptotically -- you get the same efficiency as if you knew $\rho$ exactly.

But here is the tension every practitioner faces: GLS/FGLS gains efficiency only if the error model is correctly specified. If you get the error structure wrong -- say the errors are actually GARCH, or have structural breaks -- then FGLS can be worse than OLS. Newey-West is the conservative play: it keeps OLS (which is consistent regardless of the error structure) and just fixes the standard errors. You lose efficiency but gain robustness. In practice on a trading desk, you often see both: run FGLS for point estimates when you trust the model, but always check Newey-West standard errors as a sanity check. If they diverge substantially from the FGLS standard errors, your error model is probably wrong.

Open the full interactive solver →