LAD Estimator as MLE Under Laplace Errors
Consider the location model $y_i = \mu + \varepsilon_i$ where the errors $\varepsilon_i$ are i.i.d. $\text{Laplace}(0, b)$ with density $f(\varepsilon) = \frac{1}{2b}\exp\!\left(-\frac{|\varepsilon|}{b}\right)$.
- Show that the maximum likelihood estimator of $\mu$ is the least absolute deviations (LAD) estimator -- i.e., the value of $\mu$ that minimizes $\sum_{i=1}^n |y_i - \mu|$.
- Show that the LAD estimator equals the sample median of $y_1, \ldots, y_n$.
- Derive the asymptotic variance of the LAD estimator and compare its efficiency to OLS (the sample mean) under Laplace errors.
Hints
- Write out the log-likelihood for Laplace errors -- what loss function does maximizing it correspond to?
- To show the LAD minimizer is the median, examine the subgradient of $\sum |y_i - \mu|$ and find where it equals zero.
- For the asymptotic variance, use the standard result that the sample median has asymptotic variance /(4nf(0)^2)$ where $f$ is the error density, and compare to $\text{Var}(\varepsilon_i)/n$ for the mean.b^2/n$.
Worked Solution
How to Think About It: The connection between MLE and loss functions is direct -- taking the negative log-likelihood turns a maximization into a minimization of some loss. For Gaussian errors, the negative log-likelihood is proportional to the sum of squared residuals, giving OLS. For Laplace errors, the negative log-likelihood is proportional to the sum of absolute residuals, giving LAD. This is why robust statisticians love the median: it is the MLE when errors are heavy-tailed (Laplace), while the mean is the MLE when errors are Gaussian. Under Laplace errors, the median is actually more efficient than the mean.
Key Insight: The Laplace density's absolute value in the exponent directly produces an $L^1$ loss function in the log-likelihood, making the LAD estimator the MLE.
The Method:
Part 1: LAD is the MLE.
The joint density of $y_1, \ldots, y_n$ given $\mu$ is:
$L(\mu) = \prod_{i=1}^n \frac{1}{2b} \exp\!\left(-\frac{|y_i - \mu|}{b}\right) = \frac{1}{(2b)^n} \exp\!\left(-\frac{1}{b}\sum_{i=1}^n |y_i - \mu|\right)$
The log-likelihood is:
$\ell(\mu) = -n\ln(2b) - \frac{1}{b}\sum_{i=1}^n |y_i - \mu|$
Maximizing $\ell(\mu)$ over $\mu$ is equivalent to minimizing:
$\sum_{i=1}^n |y_i - \mu|$
This is precisely the LAD objective. So the MLE $\hat{\mu}_{\text{MLE}} = \hat{\mu}_{\text{LAD}} = \arg\min_{\mu} \sum_i |y_i - \mu|$.
Part 2: LAD equals the sample median.
The function $g(\mu) = \sum_{i=1}^n |y_i - \mu|$ is convex and piecewise linear, with kinks at each $y_i$. Its subdifferential at $\mu$ is:
$\partial g(\mu) = \sum_{i=1}^n \text{sign}(\mu - y_i)$
where $\text{sign}(\mu - y_i) \in [-1, 1]$ when $\mu = y_i$. The minimum occurs where $0 \in \partial g(\mu)$, which requires:
$|\{i : y_i < \mu\}| - |\{i : y_i > \mu\}| \ni 0$
This is exactly the condition that $\mu$ is a median of $y_1, \ldots, y_n$. For odd $n$, the minimizer is the middle order statistic. For even $n$, any value between the two middle order statistics minimizes $g$ (conventionally, we take the average).
Part 3: Asymptotic variance and efficiency comparison.
For the LAD estimator (sample median), the asymptotic distribution is:
$\sqrt{n}(\hat{\mu}_{\text{LAD}} - \mu) \xrightarrow{d} N\!\left(0, \frac{1}{4f(0)^2}\right)$
where $f(0) = \frac{1}{2b}$ is the density of $\varepsilon_i$ at zero. Substituting:
$\text{Var}_{\text{asy}}(\hat{\mu}_{\text{LAD}}) = \frac{1}{n} \cdot \frac{1}{4 \cdot (1/(2b))^2} = \frac{b^2}{n}$
For OLS (sample mean $\bar{y}$):
$\text{Var}(\bar{y}) = \frac{\text{Var}(\varepsilon_i)}{n} = \frac{2b^2}{n}$
since $\text{Var}(\text{Laplace}(0,b)) = 2b^2$.
Efficiency comparison: The asymptotic relative efficiency of LAD to OLS is:
$\text{ARE}(\text{LAD}, \text{OLS}) = \frac{\text{Var}(\bar{y})}{\text{Var}(\hat{\mu}_{\text{LAD}})} = \frac{2b^2/n}{b^2/n} = 2$
So under Laplace errors, the sample median is twice as efficient as the sample mean. The median has asymptotic variance $b^2/n$, while the mean has
Practical Considerations: Under Gaussian errors, the roles reverse: OLS has ARE $= \pi/2 \approx 1.57$ over LAD. But the Laplace distribution has heavier tails than the Gaussian, and the median's robustness to outliers gives it a decisive advantage. In financial data, where return distributions are typically leptokurtic (heavier tails than Gaussian), the median and other robust estimators often outperform the mean.
Answer:
- The MLE under Laplace errors minimizes $\sum |y_i - \mu|$, which is the LAD objective.
- The minimizer of $\sum |y_i - \mu|$ is the sample median.
- The asymptotic variance of the median is $b^2/n$, while the mean's is