LAD Estimator as MLE Under Laplace Errors

Statistics · Medium · Free problem

Consider the location model $y_i = \mu + \varepsilon_i$ where the errors $\varepsilon_i$ are i.i.d. $\text{Laplace}(0, b)$ with density $f(\varepsilon) = \frac{1}{2b}\exp\!\left(-\frac{|\varepsilon|}{b}\right)$.

  1. Show that the maximum likelihood estimator of $\mu$ is the least absolute deviations (LAD) estimator -- i.e., the value of $\mu$ that minimizes $\sum_{i=1}^n |y_i - \mu|$.
  1. Show that the LAD estimator equals the sample median of $y_1, \ldots, y_n$.
  1. Derive the asymptotic variance of the LAD estimator and compare its efficiency to OLS (the sample mean) under Laplace errors.

Hints

  1. Write out the log-likelihood for Laplace errors -- what loss function does maximizing it correspond to?
  2. To show the LAD minimizer is the median, examine the subgradient of $\sum |y_i - \mu|$ and find where it equals zero.
  3. For the asymptotic variance, use the standard result that the sample median has asymptotic variance
    /(4nf(0)^2)$ where $f$ is the error density, and compare to $\text{Var}(\varepsilon_i)/n$ for the mean.

Worked Solution

How to Think About It: The connection between MLE and loss functions is direct -- taking the negative log-likelihood turns a maximization into a minimization of some loss. For Gaussian errors, the negative log-likelihood is proportional to the sum of squared residuals, giving OLS. For Laplace errors, the negative log-likelihood is proportional to the sum of absolute residuals, giving LAD. This is why robust statisticians love the median: it is the MLE when errors are heavy-tailed (Laplace), while the mean is the MLE when errors are Gaussian. Under Laplace errors, the median is actually more efficient than the mean.

Key Insight: The Laplace density's absolute value in the exponent directly produces an $L^1$ loss function in the log-likelihood, making the LAD estimator the MLE.

The Method:

Part 1: LAD is the MLE.

The joint density of $y_1, \ldots, y_n$ given $\mu$ is:

$L(\mu) = \prod_{i=1}^n \frac{1}{2b} \exp\!\left(-\frac{|y_i - \mu|}{b}\right) = \frac{1}{(2b)^n} \exp\!\left(-\frac{1}{b}\sum_{i=1}^n |y_i - \mu|\right)$

The log-likelihood is:

$\ell(\mu) = -n\ln(2b) - \frac{1}{b}\sum_{i=1}^n |y_i - \mu|$

Maximizing $\ell(\mu)$ over $\mu$ is equivalent to minimizing:

$\sum_{i=1}^n |y_i - \mu|$

This is precisely the LAD objective. So the MLE $\hat{\mu}_{\text{MLE}} = \hat{\mu}_{\text{LAD}} = \arg\min_{\mu} \sum_i |y_i - \mu|$.

Part 2: LAD equals the sample median.

The function $g(\mu) = \sum_{i=1}^n |y_i - \mu|$ is convex and piecewise linear, with kinks at each $y_i$. Its subdifferential at $\mu$ is:

$\partial g(\mu) = \sum_{i=1}^n \text{sign}(\mu - y_i)$

where $\text{sign}(\mu - y_i) \in [-1, 1]$ when $\mu = y_i$. The minimum occurs where $0 \in \partial g(\mu)$, which requires:

$|\{i : y_i < \mu\}| - |\{i : y_i > \mu\}| \ni 0$

This is exactly the condition that $\mu$ is a median of $y_1, \ldots, y_n$. For odd $n$, the minimizer is the middle order statistic. For even $n$, any value between the two middle order statistics minimizes $g$ (conventionally, we take the average).

Part 3: Asymptotic variance and efficiency comparison.

For the LAD estimator (sample median), the asymptotic distribution is:

$\sqrt{n}(\hat{\mu}_{\text{LAD}} - \mu) \xrightarrow{d} N\!\left(0, \frac{1}{4f(0)^2}\right)$

where $f(0) = \frac{1}{2b}$ is the density of $\varepsilon_i$ at zero. Substituting:

$\text{Var}_{\text{asy}}(\hat{\mu}_{\text{LAD}}) = \frac{1}{n} \cdot \frac{1}{4 \cdot (1/(2b))^2} = \frac{b^2}{n}$

For OLS (sample mean $\bar{y}$):

$\text{Var}(\bar{y}) = \frac{\text{Var}(\varepsilon_i)}{n} = \frac{2b^2}{n}$

since $\text{Var}(\text{Laplace}(0,b)) = 2b^2$.

Efficiency comparison: The asymptotic relative efficiency of LAD to OLS is:

$\text{ARE}(\text{LAD}, \text{OLS}) = \frac{\text{Var}(\bar{y})}{\text{Var}(\hat{\mu}_{\text{LAD}})} = \frac{2b^2/n}{b^2/n} = 2$

So under Laplace errors, the sample median is twice as efficient as the sample mean. The median has asymptotic variance $b^2/n$, while the mean has

b^2/n$.

Practical Considerations: Under Gaussian errors, the roles reverse: OLS has ARE $= \pi/2 \approx 1.57$ over LAD. But the Laplace distribution has heavier tails than the Gaussian, and the median's robustness to outliers gives it a decisive advantage. In financial data, where return distributions are typically leptokurtic (heavier tails than Gaussian), the median and other robust estimators often outperform the mean.

Answer:

  1. The MLE under Laplace errors minimizes $\sum |y_i - \mu|$, which is the LAD objective.
  2. The minimizer of $\sum |y_i - \mu|$ is the sample median.
  3. The asymptotic variance of the median is $b^2/n$, while the mean's is b^2/n$. The median is twice as efficient as the mean under Laplace errors.

Intuition

This problem illustrates a deep connection in statistics: every error distribution has a "natural" estimator that is its MLE, and that estimator minimizes a corresponding loss function. Gaussian errors give squared loss (the mean), Laplace errors give absolute loss (the median), and more generally the exponential power family interpolates between them. The practical takeaway is that the "best" location estimator depends on the tails of your error distribution.

In quantitative finance, this matters because financial returns are heavy-tailed. If you estimate a location parameter (like expected return or a regression intercept) using the mean, you are implicitly assuming Gaussian errors and giving outsized influence to extreme observations. The median -- or equivalently the LAD estimator -- down-weights outliers and can be dramatically more efficient when tails are heavier than Gaussian. The factor-of-two efficiency gain under Laplace errors is a concrete illustration of why robust methods deserve a place in every quant's toolkit.

Open the full interactive solver →