MLE for Uniform Distribution

Statistics · Medium · Free problem

You draw $n$ i.i.d. samples $X_1, X_2, \ldots, X_n$ from a $\text{Uniform}(0, \theta)$ distribution, where $\theta > 0$ is unknown.

  1. Write down the likelihood function $L(\theta)$ and identify for which values of $\theta$ it is nonzero.
  1. Derive the Maximum Likelihood Estimator (MLE) $\hat{\theta}_{\text{MLE}}$.
  1. Is $\hat{\theta}_{\text{MLE}}$ unbiased? If not, construct an unbiased estimator based on the MLE and prove that it is unbiased.

Hints

  1. The likelihood involves indicator functions -- for which values of $\theta$ is it nonzero?
  2. On the region where $L(\theta) > 0$, notice that
    /\theta^n$ is monotonically decreasing. Where does the maximum occur?
  3. To check bias, derive the CDF of $X_{(n)} = \max_i X_i$ by noting $P(X_{(n)} \le x) = P(X_1 \le x)^n$, then compute $E[X_{(n)}]$.

Worked Solution

How to Think About It: This is one of the cleanest examples of how MLE can behave oddly compared to, say, the normal distribution case. The uniform likelihood is not a smooth bell -- it is a step function that is only nonzero when $\theta$ is at least as large as every sample. So you are not setting a derivative to zero. Instead you are asking: what is the smallest $\theta$ that is consistent with the data? That is the max of the sample. Before touching any math, your gut should say: the MLE will be the largest observation, and it will systematically underestimate $\theta$ because the sample max can never overshoot.

Quick Estimate: Suppose $\theta = 100$ and $n = 10$. The sample max should cluster near

00$ but always below it. On average $E[X_{(n)}] = \frac{n}{n+1} \theta = \frac{10}{11} \times 100 \approx 90.9$, so the MLE is biased about $9\%$ low with only 10 observations. With $n = 100$, the bias shrinks to about
\%$. The correction factor $(n+1)/n$ fixes this exactly.

Approach: Write the likelihood, observe it is decreasing on its support, then derive the order statistic distribution to check bias.

Formal Solution:

*Part 1 -- Likelihood:*

Each $X_i$ has density $f(x \mid \theta) = \frac{1}{\theta} \cdot \mathbf{1}(0 \le x \le \theta)$. The joint likelihood is:

$L(\theta) = \prod_{i=1}^n \frac{1}{\theta} \cdot \mathbf{1}(0 \le X_i \le \theta) = \frac{1}{\theta^n} \cdot \mathbf{1}(\theta \ge X_{(n)})$

where $X_{(n)} = \max(X_1, \ldots, X_n)$. The indicator collapses all the individual constraints into one: $\theta$ must be at least as large as the biggest observation, otherwise the likelihood is zero.

*Part 2 -- MLE:*

On the region $\theta \ge X_{(n)}$, the likelihood is $L(\theta) = 1/\theta^n$, which is strictly decreasing in $\theta$. So the maximum is achieved at the smallest allowable value:

$\hat{\theta}_{\text{MLE}} = X_{(n)} = \max(X_1, \ldots, X_n)$

*Part 3 -- Bias:*

The CDF of the sample maximum is $F_{X_{(n)}}(x) = \left(\frac{x}{\theta}\right)^n$ for $0 \le x \le \theta$. Differentiating gives the density:

$f_{X_{(n)}}(x) = \frac{n x^{n-1}}{\theta^n}, \quad 0 \le x \le \theta$

The expected value is:

$E[X_{(n)}] = \int_0^{\theta} x \cdot \frac{n x^{n-1}}{\theta^n} \, dx = \frac{n}{\theta^n} \int_0^{\theta} x^n \, dx = \frac{n}{\theta^n} \cdot \frac{\theta^{n+1}}{n+1} = \frac{n}{n+1} \theta$

Since $E[\hat{\theta}_{\text{MLE}}] = \frac{n}{n+1} \theta < \theta$, the MLE is biased downward.

To fix the bias, define:

$\hat{\theta}_{\text{unbiased}} = \frac{n+1}{n} X_{(n)}$

Then $E[\hat{\theta}_{\text{unbiased}}] = \frac{n+1}{n} \cdot \frac{n}{n+1} \theta = \theta$, confirming unbiasedness.

Answer: The MLE is $\hat{\theta}_{\text{MLE}} = X_{(n)} = \max_i X_i$. It is biased: $E[X_{(n)}] = \frac{n}{n+1}\theta$. The unbiased correction is $\hat{\theta}_{\text{unbiased}} = \frac{n+1}{n} X_{(n)}$.

Intuition

This problem is the textbook example of why MLE is not always unbiased and why you should not blindly trust derivatives-equal-zero for optimization. The uniform distribution's likelihood has a hard boundary -- it jumps from zero to positive at $\theta = X_{(n)}$ -- so the maximizer is at the boundary, not at an interior critical point. This is a fundamentally different optimization landscape than, say, MLE for the normal mean.

In practice, this pattern shows up whenever you are estimating the support of a distribution (think: estimating the maximum possible loss, the largest order size, or the capacity of a queue). The sample max is always an underestimate, and the correction factor $(n+1)/n$ is a version of the "add-one" smoothing idea: you have seen $n$ observations partitioning the interval into $n+1$ gaps, and the true endpoint is roughly one gap beyond the max. This is also a good interview moment to mention that while the unbiased estimator $\frac{n+1}{n} X_{(n)}$ fixes the bias, the MLE actually has lower mean squared error -- a classic bias-variance tradeoff.

Open the full interactive solver →