Linear MMSE Estimator for a Conditional Exponential

Expectation · Medium · Free problem

Let $X \sim U(1, 2)$. Conditional on $X = x$, let $Y \sim \text{Exp}(1/x)$, so the conditional mean and variance of $Y$ are both functions of $x$.

Work through the following steps:

Compute $E[X]$, $\text{Var}(X)$, $E[Y]$, $\text{Var}(Y)$, and $\text{Cov}(X, Y)$.
Write down the linear MMSE estimator $\hat{X}_L$ of $X$ given $Y$.
Compute the MSE of $\hat{X}_L$.

Hints

Use the tower property $E[Y] = E[E[Y|X]]$ and the law of total variance $\text{Var}(Y) = E[\text{Var}(Y|X)] + \text{Var}(E[Y|X])$ -- both applied to the known conditional moments of an Exp$(1/x)$ distribution.
For $\text{Cov}(X, Y)$, use $E[XY] = E[X \cdot E[Y|X]]$ via the tower property, which avoids computing the joint density entirely.
The linear MMSE estimator is $\hat{X}_L = E[X] + \frac{\text{Cov}(X,Y)}{\text{Var}(Y)}(Y - E[Y])$, and its MSE equals $(1 - \rho^2)\text{Var}(X)$ where $\rho = \text{Cov}(X,Y) / \sqrt{\text{Var}(X)\text{Var}(Y)}$.

Worked Solution

How to Think About It: This is a structured moment computation followed by a plug-in formula. The law of total expectation and law of total variance are your main tools -- they let you sidestep the joint density entirely. Once you have the five moments in part 1, the linear MMSE estimator is just a formula: it is the regression of $X$ on $Y$, so you need $\text{Cov}(X, Y) / \text{Var}(Y)$ as the slope and the means as the anchor. The MSE is then $(1 - \rho^2) \text{Var}(X)$, where $\rho$ is the correlation. Before computing anything, note that $E[Y|X=x] = x$ and $\text{Var}(Y|X=x) = x^2$ -- both are simple functions of $X$, which makes the tower property applications clean.

Quick Estimate: For a rough sanity check: $X$ is uniform on $[1, 2]$, so $E[X] = 1.5$ and $\text{Var}(X) = 1/12 \approx 0.083$. Since $E[Y|X] = X$, we expect $E[Y] = 1.5$ as well. The correlation between $X$ and $Y$ should be positive but not huge -- $Y$ is noisy around $X$, and $\text{Var}(Y|X) = X^2$ is large relative to $\text{Var}(X)$. So $\rho^2$ should be small, meaning the linear estimator will not be very precise. A rough guess: $\rho^2 \approx 1/30$, so MSE $\approx (29/30)(1/12) \approx 0.08$, barely below $\text{Var}(X)$. The linear estimator barely helps here -- a sign that $Y$ is a very noisy signal for $X$.

Approach: Apply the tower property for each moment, then plug into the linear MMSE formula.

Formal Solution:

Part 1: Moments of $X$

$E[X] = \frac{1+2}{2} = \frac{3}{2}$

$E[X^2] = \int_1^2 x^2 \, dx = \frac{x^3}{3}\Big|_1^2 = \frac{8-1}{3} = \frac{7}{3}$

$\text{Var}(X) = E[X^2] - (E[X])^2 = \frac{7}{3} - \frac{9}{4} = \frac{28 - 27}{12} = \frac{1}{12}$

Conditional moments of $Y$: For $Y | X = x \sim \text{Exp}(1/x)$, the mean is $x$ and the variance is $x^2$:

$E[Y|X=x] = x, \quad \text{Var}(Y|X=x) = x^2$

Moments of $Y$:

$E[Y] = E[E[Y|X]] = E[X] = \frac{3}{2}$

For $E[Y^2]$, use the decomposition $E[Y^2] = E[\text{Var}(Y|X)] + E[(E[Y|X])^2]$:

$E[Y^2] = E[X^2] + E[X^2] = 2 \cdot \frac{7}{3} = \frac{14}{3}$

$\text{Var}(Y) = \frac{14}{3} - \left(\frac{3}{2}\right)^2 = \frac{14}{3} - \frac{9}{4} = \frac{56 - 27}{12} = \frac{29}{12}$

Covariance:

$E[XY] = E[X \cdot E[Y|X]] = E[X \cdot X] = E[X^2] = \frac{7}{3}$

$\text{Cov}(X, Y) = E[XY] - E[X]E[Y] = \frac{7}{3} - \frac{9}{4} = \frac{1}{12}$

Part 2: Linear MMSE Estimator

The linear MMSE estimator has the form $\hat{X}_L = a + bY$, with:

$b = \frac{\text{Cov}(X,Y)}{\text{Var}(Y)} = \frac{1/12}{29/12} = \frac{1}{29}$

$a = E[X] - b \cdot E[Y] = \frac{3}{2} - \frac{1}{29} \cdot \frac{3}{2} = \frac{3}{2} \cdot \frac{28}{29} = \frac{42}{29}$

So:

$\hat{X}_L = \frac{3}{2} + \frac{1}{29}\left(Y - \frac{3}{2}\right) = \frac{42}{29} + \frac{1}{29}Y$

Part 3: MSE

$\rho^2 = \frac{\text{Cov}(X,Y)^2}{\text{Var}(X) \cdot \text{Var}(Y)} = \frac{(1/12)^2}{(1/12)(29/12)} = \frac{1}{29}$

$\text{MSE}(\hat{X}_L) = (1 - \rho^2)\text{Var}(X) = \left(1 - \frac{1}{29}\right) \cdot \frac{1}{12} = \frac{28}{29} \cdot \frac{1}{12} = \frac{7}{87}$

Answer:

$E[X] = 3/2$, $\text{Var}(X) = 1/12$
$E[Y] = 3/2$, $\text{Var}(Y) = 29/12$
$\text{Cov}(X, Y) = 1/12$
$\hat{X}_L = \dfrac{3}{2} + \dfrac{1}{29}\left(Y - \dfrac{3}{2}\right)$
$\text{MSE} = 7/87 \approx 0.0805$

Intuition

The tower property is a workhorse for hierarchical models: whenever $Y$ is specified through its conditional distribution given $X$, you can compute all marginal moments of $Y$ without ever writing down the joint density. Here, $E[Y|X] = X$ makes $Y$ an unbiased but noisy version of $X$ -- the noise variance $\text{Var}(Y|X) = X^2$ is itself random and large. That large conditional variance is why $\text{Var}(Y) = 29/12$ swamps $\text{Var}(X) = 1/12$, keeping $\rho^2 = 1/29$ tiny.

The practical lesson is that the linear MMSE estimator is just the OLS regression formula in disguise: slope = $\text{Cov}/\text{Var}$, intercept anchors at the means. The MSE formula $(1 - \rho^2)\text{Var}(X)$ tells you at a glance how useful $Y$ is as a signal for $X$ -- here, almost useless, since $\rho^2 \approx 3.4\%$. In signal processing and Kalman filtering, this same structure appears constantly: a latent state $X$ drives a noisy observation $Y$, and you want the best linear filter. The key subtlety is that "best linear" is not the same as "best" -- the true conditional expectation $E[X|Y]$ would require integrating out the Beta-type posterior, and it would outperform this linear estimator.

Open the full interactive solver →