Let $(X, Y)$ be jointly normal with means $\mu_X, \mu_Y$, variances $\sigma_X^2, \sigma_Y^2$, and correlation $\rho$. 1. Derive $E[Y \mid X = x]$. 2. Derive $\text{Var}(Y \mid X = x)$. 3. Explain why the conditional variance does not depend on the observed value $x$.

Conditional Expectation and Variance of Bivariate Normal

Expectation · Medium · Free problem

Let $(X, Y)$ be jointly normal with means $\mu_X, \mu_Y$, variances $\sigma_X^2, \sigma_Y^2$, and correlation $\rho$.

Derive $E[Y \mid X = x]$.

Derive $\text{Var}(Y \mid X = x)$.

Explain why the conditional variance does not depend on the observed value $x$.

Hints

Decompose $Y$ into a part that depends on $X$ plus independent noise. What coefficient on $X$ makes the noise uncorrelated with $X$?
Write $Y = \mu_Y + \beta(X - \mu_X) + \varepsilon$ and choose $\beta$ so that $\text{Cov}(\varepsilon, X) = 0$. For jointly normal variables, uncorrelated implies independent.
The coefficient is $\beta = \rho \sigma_Y / \sigma_X$. Once you have $\varepsilon \perp X$, conditioning on $X = x$ kills the $X$ term and leaves only the variance of $\varepsilon$.

Worked Solution

How to Think About It: This is one of the most important results in quantitative finance -- it is the foundation of linear regression, CAPM, hedging, and basically anything that involves predicting one variable given another. The key idea is that for jointly normal random variables, conditioning on $X$ is the same thing as projecting $Y$ onto $X$ and looking at the residual. The projection gives you the conditional mean (a linear function of $x$), and the residual gives you the conditional variance (a constant that does not depend on $x$). If someone asks you this in an interview, start by saying: "I will decompose $Y$ into its projection onto $X$ plus independent noise."

Quick Estimate: Before doing any algebra, think about what the answer should look like. The conditional mean $E[Y \mid X = x]$ should be linear in $x$ (jointly normal implies linearity), pass through $(\mu_X, \mu_Y)$, and have slope proportional to $\rho$. When $\rho = 0$, knowing $X$ tells you nothing, so $E[Y \mid X = x] = \mu_Y$. When $\rho = 1$, $Y$ is a deterministic linear function of $X$ with zero conditional variance. The conditional variance should be $\sigma_Y^2$ when $\rho = 0$ (knowing $X$ does not help) and $0$ when $|\rho| = 1$ (knowing $X$ pins down $Y$ exactly). So we expect something like $\sigma_Y^2(1 - \rho^2)$.

Approach: Decompose $Y$ into a linear function of $X$ plus independent Gaussian noise. Choose the coefficient so that the noise is uncorrelated with (and hence independent of) $X$.

Formal Solution:

Write $Y$ as:

$Y = \mu_Y + \rho \frac{\sigma_Y}{\sigma_X}(X - \mu_X) + \varepsilon$

where $\varepsilon$ is the residual. We need to verify two things: (a) $\varepsilon$ has the right distribution, and (b) $\varepsilon$ is independent of $X$.

Verification that $\varepsilon \perp X$:

$\text{Cov}(\varepsilon, X) = \text{Cov}\!\left(Y - \mu_Y - \rho \frac{\sigma_Y}{\sigma_X}(X - \mu_X),\, X\right)$

$= \text{Cov}(Y, X) - \rho \frac{\sigma_Y}{\sigma_X} \text{Var}(X) = \rho \sigma_X \sigma_Y - \rho \frac{\sigma_Y}{\sigma_X} \cdot \sigma_X^2 = 0$

Since $(X, Y)$ are jointly normal, any linear combination of them is normal. So $\varepsilon$ is normal and uncorrelated with $X$, which means $\varepsilon$ is independent of $X$.

Variance of $\varepsilon$:

$\text{Var}(\varepsilon) = \text{Var}(Y) - \rho^2 \frac{\sigma_Y^2}{\sigma_X^2} \text{Var}(X) = \sigma_Y^2 - \rho^2 \sigma_Y^2 = \sigma_Y^2(1 - \rho^2)$

So $\varepsilon \sim N(0, \sigma_Y^2(1 - \rho^2))$ and $\varepsilon \perp X$.

Part 1 -- Conditional Expectation:

Condition on $X = x$:

$E[Y \mid X = x] = \mu_Y + \rho \frac{\sigma_Y}{\sigma_X}(x - \mu_X) + E[\varepsilon] = \mu_Y + \rho \frac{\sigma_Y}{\sigma_X}(x - \mu_X)$

This is linear in $x$ with slope $\beta = \rho \sigma_Y / \sigma_X$. Note that this is exactly the OLS regression coefficient of $Y$ on $X$.

Part 2 -- Conditional Variance:

$\text{Var}(Y \mid X = x) = \text{Var}(\varepsilon) = \sigma_Y^2(1 - \rho^2)$

Part 3 -- Why the conditional variance is constant:

The decomposition $Y = (\text{linear function of } X) + \varepsilon$ with $\varepsilon \perp X$ means that once you fix $X = x$, the only remaining randomness comes from $\varepsilon$, whose distribution does not depend on $x$ at all. This is a special property of the Gaussian -- it is the only distribution family where conditional variance is constant in the conditioning variable (homoscedasticity). For non-Gaussian joints, the conditional variance can and often does depend on $x$.

Answer:

$E[Y \mid X = x] = \mu_Y + \rho \frac{\sigma_Y}{\sigma_X}(x - \mu_X)$

$\text{Var}(Y \mid X = x) = \sigma_Y^2(1 - \rho^2)$

The conditional distribution is $Y \mid X = x \sim N\!\left(\mu_Y + \rho \frac{\sigma_Y}{\sigma_X}(x - \mu_X),\; \sigma_Y^2(1 - \rho^2)\right)$.

Intuition

This result is the backbone of linear regression and shows up everywhere in quantitative finance. The conditional mean $E[Y \mid X = x]$ is the best linear predictor of $Y$ given $X$, and for Gaussians it is also the best predictor period (no nonlinear function of $x$ can do better). The slope $\rho \sigma_Y / \sigma_X$ is exactly the OLS regression coefficient, and the conditional variance $\sigma_Y^2(1 - \rho^2)$ is the unexplained variance -- the fraction

- \rho^2$ of $Y s total variance that $X$ cannot account for. This is where $R^2 = \rho^2$ comes from.

In practice, this shows up whenever you hedge one asset with another. If $X$ is a hedge instrument and $Y$ is your position, the hedge ratio is $\rho \sigma_Y / \sigma_X$ and the residual risk after hedging is $\sigma_Y \sqrt{1 - \rho^2}$. A correlation of 0.9 only eliminates 81% of the variance -- you still carry 19% residual risk. People routinely overestimate how much a high correlation reduces risk, because they confuse $\rho$ with $\rho^2$. That mistake can be expensive.

Open the full interactive solver →