Conditional Expectation and Variance of Bivariate Normal
Let $(X, Y)$ be jointly normal with means $\mu_X, \mu_Y$, variances $\sigma_X^2, \sigma_Y^2$, and correlation $\rho$.
- Derive $E[Y \mid X = x]$.
- Derive $\text{Var}(Y \mid X = x)$.
- Explain why the conditional variance does not depend on the observed value $x$.
Hints
- Decompose $Y$ into a part that depends on $X$ plus independent noise. What coefficient on $X$ makes the noise uncorrelated with $X$?
- Write $Y = \mu_Y + \beta(X - \mu_X) + \varepsilon$ and choose $\beta$ so that $\text{Cov}(\varepsilon, X) = 0$. For jointly normal variables, uncorrelated implies independent.
- The coefficient is $\beta = \rho \sigma_Y / \sigma_X$. Once you have $\varepsilon \perp X$, conditioning on $X = x$ kills the $X$ term and leaves only the variance of $\varepsilon$.
Worked Solution
How to Think About It: This is one of the most important results in quantitative finance -- it is the foundation of linear regression, CAPM, hedging, and basically anything that involves predicting one variable given another. The key idea is that for jointly normal random variables, conditioning on $X$ is the same thing as projecting $Y$ onto $X$ and looking at the residual. The projection gives you the conditional mean (a linear function of $x$), and the residual gives you the conditional variance (a constant that does not depend on $x$). If someone asks you this in an interview, start by saying: "I will decompose $Y$ into its projection onto $X$ plus independent noise."
Quick Estimate: Before doing any algebra, think about what the answer should look like. The conditional mean $E[Y \mid X = x]$ should be linear in $x$ (jointly normal implies linearity), pass through $(\mu_X, \mu_Y)$, and have slope proportional to $\rho$. When $\rho = 0$, knowing $X$ tells you nothing, so $E[Y \mid X = x] = \mu_Y$. When $\rho = 1$, $Y$ is a deterministic linear function of $X$ with zero conditional variance. The conditional variance should be $\sigma_Y^2$ when $\rho = 0$ (knowing $X$ does not help) and $0$ when $|\rho| = 1$ (knowing $X$ pins down $Y$ exactly). So we expect something like $\sigma_Y^2(1 - \rho^2)$.
Approach: Decompose $Y$ into a linear function of $X$ plus independent Gaussian noise. Choose the coefficient so that the noise is uncorrelated with (and hence independent of) $X$.
Formal Solution:
Write $Y$ as:
$Y = \mu_Y + \rho \frac{\sigma_Y}{\sigma_X}(X - \mu_X) + \varepsilon$
where $\varepsilon$ is the residual. We need to verify two things: (a) $\varepsilon$ has the right distribution, and (b) $\varepsilon$ is independent of $X$.
Verification that $\varepsilon \perp X$:
$\text{Cov}(\varepsilon, X) = \text{Cov}\!\left(Y - \mu_Y - \rho \frac{\sigma_Y}{\sigma_X}(X - \mu_X),\, X\right)$
$= \text{Cov}(Y, X) - \rho \frac{\sigma_Y}{\sigma_X} \text{Var}(X) = \rho \sigma_X \sigma_Y - \rho \frac{\sigma_Y}{\sigma_X} \cdot \sigma_X^2 = 0$
Since $(X, Y)$ are jointly normal, any linear combination of them is normal. So $\varepsilon$ is normal and uncorrelated with $X$, which means $\varepsilon$ is independent of $X$.
Variance of $\varepsilon$:
$\text{Var}(\varepsilon) = \text{Var}(Y) - \rho^2 \frac{\sigma_Y^2}{\sigma_X^2} \text{Var}(X) = \sigma_Y^2 - \rho^2 \sigma_Y^2 = \sigma_Y^2(1 - \rho^2)$
So $\varepsilon \sim N(0, \sigma_Y^2(1 - \rho^2))$ and $\varepsilon \perp X$.
Part 1 -- Conditional Expectation:
Condition on $X = x$:
$E[Y \mid X = x] = \mu_Y + \rho \frac{\sigma_Y}{\sigma_X}(x - \mu_X) + E[\varepsilon] = \mu_Y + \rho \frac{\sigma_Y}{\sigma_X}(x - \mu_X)$
This is linear in $x$ with slope $\beta = \rho \sigma_Y / \sigma_X$. Note that this is exactly the OLS regression coefficient of $Y$ on $X$.
Part 2 -- Conditional Variance:
$\text{Var}(Y \mid X = x) = \text{Var}(\varepsilon) = \sigma_Y^2(1 - \rho^2)$
Part 3 -- Why the conditional variance is constant:
The decomposition $Y = (\text{linear function of } X) + \varepsilon$ with $\varepsilon \perp X$ means that once you fix $X = x$, the only remaining randomness comes from $\varepsilon$, whose distribution does not depend on $x$ at all. This is a special property of the Gaussian -- it is the only distribution family where conditional variance is constant in the conditioning variable (homoscedasticity). For non-Gaussian joints, the conditional variance can and often does depend on $x$.
Answer:
$E[Y \mid X = x] = \mu_Y + \rho \frac{\sigma_Y}{\sigma_X}(x - \mu_X)$
$\text{Var}(Y \mid X = x) = \sigma_Y^2(1 - \rho^2)$
The conditional distribution is $Y \mid X = x \sim N\!\left(\mu_Y + \rho \frac{\sigma_Y}{\sigma_X}(x - \mu_X),\; \sigma_Y^2(1 - \rho^2)\right)$.
Intuition
This result is the backbone of linear regression and shows up everywhere in quantitative finance. The conditional mean $E[Y \mid X = x]$ is the best linear predictor of $Y$ given $X$, and for Gaussians it is also the best predictor period (no nonlinear function of $x$ can do better). The slope $\rho \sigma_Y / \sigma_X$ is exactly the OLS regression coefficient, and the conditional variance $\sigma_Y^2(1 - \rho^2)$ is the unexplained variance -- the fraction