AR(1) with Unknown Mean: MLE and Unit Root Testing

Statistics · Hard · Free problem

You observe a sequence of returns $r_1, r_2, \ldots, r_T$ that follow an AR(1) process with unknown mean:

$r_t - \mu = \phi(r_{t-1} - \mu) + \varepsilon_t, \qquad \varepsilon_t \overset{\text{iid}}{\sim} N(0, \sigma^2)$

where $\mu$, $\phi$, and $\sigma^2$ are all unknown.

(i) Derive the maximum likelihood estimators $\hat{\mu}$, $\hat{\phi}$, and $\hat{\sigma}^2$ for this model.

(ii) Construct a level-$\alpha$ test for $H_0: \phi = 1 \quad \text{(unit root)} \qquad \text{vs} \qquad H_1: \phi < 1 \quad \text{(stationary)}$ and discuss the finite-sample caveats that make this test notoriously difficult in practice.

Hints

  1. For part (i), think of the model as a regression of demeaned returns on lagged demeaned returns -- conditioning on $r_0$, the log-likelihood factors into a Gaussian regression form.
  2. For part (ii), the OLS $t$-statistic for $\hat{\phi} = 1$ does NOT follow a $t$ or normal distribution under $H_0$ -- when $\phi = 1$ the process is a random walk and standard asymptotics fail. Look up Dickey-Fuller critical values.
  3. Rewrite the model as $\Delta r_t = \alpha + \delta r_{t-1} + \varepsilon_t$ where $\delta = \phi - 1$, so testing $H_0: \phi = 1$ becomes testing $H_0: \delta = 0$ -- this is the standard ADF setup.

Worked Solution

How to Think About It: This is a two-part problem: first, standard MLE for a Gaussian AR(1); second, the unit root testing problem, which is one of the most treacherous corners of time series econometrics. For part (i), the key move is recognizing that after demeaning, the model becomes a linear regression -- so OLS on demeaned data gives the MLE. For part (ii), the problem is that the standard asymptotic theory breaks down when $\phi = 1$. The $t$-statistic for $\hat{\phi}$ no longer has a $t$ or normal distribution under $H_0$ -- it converges to a Dickey-Fuller distribution, which is non-standard and left-skewed. Using a standard normal critical value will dramatically over-reject.

Part (i): MLE for $\mu$, $\phi$, $\sigma^2$

Conditioning on $r_0$ (or treating it as fixed), the log-likelihood is:

$\ell(\mu, \phi, \sigma^2) = -\frac{T}{2}\ln(2\pi) - \frac{T}{2}\ln(\sigma^2) - \frac{1}{2\sigma^2}\sum_{t=1}^{T}\bigl[(r_t - \mu) - \phi(r_{t-1} - \mu)\bigr]^2$

Define the demeaned series $y_t = r_t - \mu$. Then the objective is:

$\ell = -\frac{T}{2}\ln(\sigma^2) - \frac{1}{2\sigma^2}\sum_{t=1}^{T}(y_t - \phi y_{t-1})^2$

For fixed $\mu$, this is exactly an OLS regression of $y_t$ on $y_{t-1}$ with no intercept. So:

$\hat{\phi}(\mu) = \frac{\sum_{t=1}^{T} y_t y_{t-1}}{\sum_{t=1}^{T} y_{t-1}^2}$

To get $\hat{\mu}$, take the score equation $\partial \ell / \partial \mu = 0$:

$\sum_{t=1}^{T} \bigl[(r_t - \mu) - \phi(r_{t-1} - \mu)\bigr](1 - \phi) = 0$

This gives:

$\hat{\mu} = \frac{\bar{r}_T - \hat{\phi}\,\bar{r}_{T-1}}{1 - \hat{\phi}}$

where $\bar{r}_T = T^{-1}\sum_{t=1}^{T} r_t$ and $\bar{r}_{T-1} = T^{-1}\sum_{t=1}^{T} r_{t-1}$. Note this is a system -- $\hat{\mu}$ and $\hat{\phi}$ are jointly determined. In practice, one iterates (or uses the fact that for large $T$, $\bar{r}_T \approx \bar{r}_{T-1}$, giving $\hat{\mu} \approx \bar{r}$ as an approximation). A clean closed form: profile over $\phi$ first, then solve for $\mu$.

Once $\hat{\mu}$ and $\hat{\phi}$ are pinned down, the MLE for $\sigma^2$ is:

$\hat{\sigma}^2 = \frac{1}{T}\sum_{t=1}^{T}\bigl[(r_t - \hat{\mu}) - \hat{\phi}(r_{t-1} - \hat{\mu})\bigr]^2$

Note this is the biased (MLE) version, dividing by $T$ not $T-2$.

Answer (Part i): $\hat{\phi}$ is the OLS slope from regressing demeaned $r_t$ on demeaned $r_{t-1}$; $\hat{\mu}$ is a weighted combination of sample means solved jointly with $\hat{\phi}$; $\hat{\sigma}^2$ is the average squared residual.

---

Part (ii): Testing $H_0: \phi = 1$

The naive approach and why it fails:

The naive test: compute the OLS $t$-statistic for $\hat{\phi} = 1$:

$t_{\text{DF}} = \frac{\hat{\phi} - 1}{\text{se}(\hat{\phi})}$

and compare to $-z_\alpha$ from a standard normal (one-sided, since $H_1: \phi < 1$). This is wrong. When $\phi = 1$, the process is a random walk -- non-stationary. The standard OLS theory assumes stationarity, so asymptotics break down. Specifically:

  • Under stationarity ($|\phi| < 1$): $\sqrt{T}(\hat{\phi} - \phi) \to N(0, 1 - \phi^2)$ -- the standard result.
  • Under $H_0$ ($\phi = 1$): $T(\hat{\phi} - 1) \to$ a functional of Brownian motion. The $t$-statistic converges to the Dickey-Fuller distribution, which is left-skewed and has more mass in the left tail than a normal.

Using a normal critical value (say $z_{0.05} = -1.645$) when the true critical value is around $-2.86$ (for a demeaned AR(1) at 5%) will lead to massive under-rejection of a false null -- you miss unit roots far too often.

The correct procedure (Augmented Dickey-Fuller test):

  1. Rewrite the model in "Dickey-Fuller form" by subtracting $r_{t-1}$ from both sides:

$\Delta r_t = \alpha + \delta r_{t-1} + \varepsilon_t$

where $\alpha = \mu(1 - \phi)$ and $\delta = \phi - 1$. Testing $H_0: \phi = 1$ is equivalent to $H_0: \delta = 0$.

  1. If there may be serial correlation in $\varepsilon_t$ (which in practice there always is), augment with lagged differences:

$\Delta r_t = \alpha + \delta r_{t-1} + \sum_{j=1}^{p} \gamma_j \Delta r_{t-j} + \varepsilon_t$

  1. Run OLS, compute the $t$-statistic for $\hat{\delta} = 0$, and compare to Dickey-Fuller critical values (e.g., $-3.45$ at 5% for a model with intercept, $-2.89$ without). These are tabulated from simulation.
  1. Reject $H_0$ (conclude stationary) if $t_{\text{DF}} < c_\alpha$ where $c_\alpha$ is the Dickey-Fuller critical value.

Finite-sample caveats:

  • Near-unit-root bias: When $\phi$ is close to but not equal to 1 (say 0.97), OLS is biased downward in finite samples -- Hurwicz/Nickell bias. $\hat{\phi}$ will systematically underestimate $\phi$, inflating the false rejection rate.
  • Lag selection: Choosing $p$ (the augmentation order) badly distorts size. Too few lags and autocorrelated errors inflate the $t$-stat; too many lags waste power. Use AIC/BIC or the general-to-specific "t-sig" method.
  • Trending data: If returns have a deterministic trend, you need to include a trend term in the ADF regression and use different (more negative) critical values. Using the wrong specification shifts the entire null distribution.
  • Low power: The ADF test has notoriously low power against near-unit-root alternatives, especially with small $T$. A process with $\phi = 0.95$ over $T = 100$ observations is nearly indistinguishable from a unit root by this test. This is not a weakness of the ADF per se -- it is a fundamental identification problem.

Answer (Part ii): Use the Augmented Dickey-Fuller $t$-statistic for $\hat{\delta} = 0$ in the regression $\Delta r_t = \alpha + \delta r_{t-1} + \ldots$ and compare to Dickey-Fuller critical values (not normal). Key caveats: OLS is biased in finite samples near the unit root, lag selection matters for size, and power is low against near-unit-root alternatives.

Intuition

The MLE part is a reminder that OLS and MLE coincide for Gaussian linear models -- once you recognize that demeaning reduces the AR(1) to a no-intercept linear regression, the estimators fall out immediately. The subtlety is that $\hat{\mu}$ and $\hat{\phi}$ are jointly determined, which makes the closed form slightly messier than a standard regression. In practice this is usually resolved iteratively or by noting that for large $T$ the two sample means converge.

The unit root testing part is genuinely hard and has been a major research area in econometrics for decades. The core lesson: non-stationarity breaks the Lindeberg conditions that underpin standard CLT-based asymptotics. When $\phi = 1$, partial sums of the errors converge to Brownian motion rather than a normal, and the asymptotic distribution of the $t$-statistic becomes a functional of that Brownian motion (the Dickey-Fuller distribution). This is left-skewed and more extreme than a normal, meaning you need more negative critical values to achieve the same size. Missing this in an interview -- or worse, in a production model -- leads to badly oversized tests and a tendency to incorrectly classify random walks as stationary. In quant finance this has real consequences: mean-reversion strategies built on a series that is actually a random walk will bleed money.

Open the full interactive solver →