MLE for Uniform Distribution
You draw $n$ i.i.d. samples $X_1, X_2, \ldots, X_n$ from a $\text{Uniform}(0, \theta)$ distribution, where $\theta > 0$ is unknown.
- Write down the likelihood function $L(\theta)$ and identify for which values of $\theta$ it is nonzero.
- Derive the Maximum Likelihood Estimator (MLE) $\hat{\theta}_{\text{MLE}}$.
- Is $\hat{\theta}_{\text{MLE}}$ unbiased? If not, construct an unbiased estimator based on the MLE and prove that it is unbiased.
Hints
- The likelihood involves indicator functions -- for which values of $\theta$ is it nonzero?
- On the region where $L(\theta) > 0$, notice that /\theta^n$ is monotonically decreasing. Where does the maximum occur?
- To check bias, derive the CDF of $X_{(n)} = \max_i X_i$ by noting $P(X_{(n)} \le x) = P(X_1 \le x)^n$, then compute $E[X_{(n)}]$.
Worked Solution
How to Think About It: This is one of the cleanest examples of how MLE can behave oddly compared to, say, the normal distribution case. The uniform likelihood is not a smooth bell -- it is a step function that is only nonzero when $\theta$ is at least as large as every sample. So you are not setting a derivative to zero. Instead you are asking: what is the smallest $\theta$ that is consistent with the data? That is the max of the sample. Before touching any math, your gut should say: the MLE will be the largest observation, and it will systematically underestimate $\theta$ because the sample max can never overshoot.
Quick Estimate: Suppose $\theta = 100$ and $n = 10$. The sample max should cluster near
00$ but always below it. On average $E[X_{(n)}] = \frac{n}{n+1} \theta = \frac{10}{11} \times 100 \approx 90.9$, so the MLE is biased about $9\%$ low with only 10 observations. With $n = 100$, the bias shrinks to about\%$. The correction factor $(n+1)/n$ fixes this exactly.Approach: Write the likelihood, observe it is decreasing on its support, then derive the order statistic distribution to check bias.
Formal Solution:
*Part 1 -- Likelihood:*
Each $X_i$ has density $f(x \mid \theta) = \frac{1}{\theta} \cdot \mathbf{1}(0 \le x \le \theta)$. The joint likelihood is:
$L(\theta) = \prod_{i=1}^n \frac{1}{\theta} \cdot \mathbf{1}(0 \le X_i \le \theta) = \frac{1}{\theta^n} \cdot \mathbf{1}(\theta \ge X_{(n)})$
where $X_{(n)} = \max(X_1, \ldots, X_n)$. The indicator collapses all the individual constraints into one: $\theta$ must be at least as large as the biggest observation, otherwise the likelihood is zero.
*Part 2 -- MLE:*
On the region $\theta \ge X_{(n)}$, the likelihood is $L(\theta) = 1/\theta^n$, which is strictly decreasing in $\theta$. So the maximum is achieved at the smallest allowable value:
$\hat{\theta}_{\text{MLE}} = X_{(n)} = \max(X_1, \ldots, X_n)$
*Part 3 -- Bias:*
The CDF of the sample maximum is $F_{X_{(n)}}(x) = \left(\frac{x}{\theta}\right)^n$ for $0 \le x \le \theta$. Differentiating gives the density:
$f_{X_{(n)}}(x) = \frac{n x^{n-1}}{\theta^n}, \quad 0 \le x \le \theta$
The expected value is:
$E[X_{(n)}] = \int_0^{\theta} x \cdot \frac{n x^{n-1}}{\theta^n} \, dx = \frac{n}{\theta^n} \int_0^{\theta} x^n \, dx = \frac{n}{\theta^n} \cdot \frac{\theta^{n+1}}{n+1} = \frac{n}{n+1} \theta$
Since $E[\hat{\theta}_{\text{MLE}}] = \frac{n}{n+1} \theta < \theta$, the MLE is biased downward.
To fix the bias, define:
$\hat{\theta}_{\text{unbiased}} = \frac{n+1}{n} X_{(n)}$
Then $E[\hat{\theta}_{\text{unbiased}}] = \frac{n+1}{n} \cdot \frac{n}{n+1} \theta = \theta$, confirming unbiasedness.
Answer: The MLE is $\hat{\theta}_{\text{MLE}} = X_{(n)} = \max_i X_i$. It is biased: $E[X_{(n)}] = \frac{n}{n+1}\theta$. The unbiased correction is $\hat{\theta}_{\text{unbiased}} = \frac{n+1}{n} X_{(n)}$.
Intuition
This problem is the textbook example of why MLE is not always unbiased and why you should not blindly trust derivatives-equal-zero for optimization. The uniform distribution's likelihood has a hard boundary -- it jumps from zero to positive at $\theta = X_{(n)}$ -- so the maximizer is at the boundary, not at an interior critical point. This is a fundamentally different optimization landscape than, say, MLE for the normal mean.
In practice, this pattern shows up whenever you are estimating the support of a distribution (think: estimating the maximum possible loss, the largest order size, or the capacity of a queue). The sample max is always an underestimate, and the correction factor $(n+1)/n$ is a version of the "add-one" smoothing idea: you have seen $n$ observations partitioning the interval into $n+1$ gaps, and the true endpoint is roughly one gap beyond the max. This is also a good interview moment to mention that while the unbiased estimator $\frac{n+1}{n} X_{(n)}$ fixes the bias, the MLE actually has lower mean squared error -- a classic bias-variance tradeoff.