VaR and Expected Shortfall Under Student-t Returns
Daily returns $X$ follow a location-scale Student-$t$ distribution:
$X = \mu + \sigma \cdot T$
where $T \sim t_\nu$ is a standard Student-$t$ with $\nu > 2$ degrees of freedom, $\mu$ is the location (mean), and $\sigma > 0$ is the scale parameter.
Let $t_\nu(\alpha)$ denote the $\alpha$-quantile of the standard $t_\nu$ distribution, and let $f_\nu(x)$ denote its density.
- Express the one-day $\text{VaR}_\alpha$ (the loss threshold at confidence level - \alpha$) in terms of $\mu$, $\sigma$, and $t_\nu(\alpha)$.
- Derive a closed-form expression for the one-day $\text{ES}_\alpha$ (Expected Shortfall, i.e., the expected loss given that the loss exceeds $\text{VaR}$) in terms of $\mu$, $\sigma$, $\nu$, $t_\nu(\alpha)$, and $f_\nu(t_\nu(\alpha))$.
- Analyze the gap $\text{ES}_\alpha - \text{VaR}_\alpha$. How does this separation depend on $\nu$? What happens as $\nu \to \infty$, and what happens for small $\nu$ (heavy tails)?
Hints
- Since $X = \mu + \sigma T$, the VaR is just a linear transformation of the standard $t$-quantile. Start there.
- For ES, you need $E[T \mid T \leq q]$. This requires the integral $\int_{-\infty}^{q} t\, f_\nu(t)\, dt$. Try differentiating the density kernel $(1 + t^2/\nu)^{-(\nu+1)/2}$ to find a useful antiderivative.
- The key identity is $\int_{-\infty}^{q} t\, f_\nu(t)\, dt = -\frac{\nu + q^2}{\nu - 1} f_\nu(q)$. Use this to get a closed-form ES, then examine the $\frac{\nu + q^2}{\nu - 1}$ factor: the /(\nu-1)$ blows up at $\nu = 1$ (where the mean dies), not at $\nu = 2$, so the gap stays finite down to $\nu \to 2^+$.
Worked Solution
How to Think About It: VaR and ES are the bread and butter of risk management. VaR tells you the threshold loss at a given confidence level; ES tells you the average loss in the tail beyond that threshold. Under Gaussian returns, the gap between ES and VaR is modest. The whole point of using Student-$t$ is to capture heavy tails -- and that is exactly where the ES-VaR gap blows up. Before writing any formulas, you should expect: (1) VaR is just a linear transformation of the $t$-quantile, (2) ES involves integrating the $t$-density in the tail, and (3) as $\nu$ shrinks (fatter tails), ES pulls further away from VaR because the conditional expectation in the tail grows.
Quick Sanity Checks: - As $\nu \to \infty$, the Student-$t$ becomes Gaussian, so both VaR and ES should reduce to their Gaussian forms. - For small $\nu$ (say $\nu = 3$), the tail is much fatter than Normal, so ES should be substantially larger than VaR. - ES $\geq$ VaR always (by definition, the conditional mean beyond the threshold exceeds the threshold). - The ES-VaR gap should be an increasing function as $\nu$ decreases.
Derivation:
Part 1: VaR
We define $\text{VaR}_\alpha$ as the loss (negative return) that is exceeded with probability $\alpha$. Writing $X = \mu + \sigma T$ with $T \sim t_\nu$:
$P(X \leq -\text{VaR}_\alpha) = \alpha$
$P\left(T \leq \frac{-\text{VaR}_\alpha - \mu}{\sigma}\right) = \alpha$
So $\frac{-\text{VaR}_\alpha - \mu}{\sigma} = t_\nu(\alpha)$, giving:
$\boxed{\text{VaR}_\alpha = -\mu - \sigma \, t_\nu(\alpha)}$
Since $\alpha$ is small (e.g., 0.01 or 0.05), $t_\nu(\alpha) < 0$, so $\text{VaR}_\alpha > 0$ when $\mu$ is near zero -- as expected.
Part 2: Expected Shortfall
ES is the conditional expected loss given that the loss exceeds VaR:
$\text{ES}_\alpha = -E[X \mid X \leq -\text{VaR}_\alpha]$
Substituting $X = \mu + \sigma T$:
$\text{ES}_\alpha = -\mu - \sigma \, E[T \mid T \leq t_\nu(\alpha)]$
We need $E[T \mid T \leq q]$ where $q = t_\nu(\alpha)$. By definition:
$E[T \mid T \leq q] = \frac{1}{\alpha} \int_{-\infty}^{q} t \, f_\nu(t) \, dt$
The key integral identity for the standard $t_\nu$ distribution is:
$\int_{-\infty}^{q} t \, f_\nu(t) \, dt = -\frac{\nu + q^2}{\nu - 1} \cdot f_\nu(q)$
This follows from writing $f_\nu(t) = \frac{1}{\sqrt{\nu}B(\nu/2, 1/2)}\left(1 + \frac{t^2}{\nu}\right)^{-(\nu+1)/2}$ and integrating by substitution $u = 1 + t^2/\nu$. The derivative of the density kernel gives us back a density-like term, producing the closed form above.
Therefore:
$E[T \mid T \leq q] = -\frac{1}{\alpha} \cdot \frac{\nu + q^2}{\nu - 1} \cdot f_\nu(q)$
Plugging back in (with $q = t_\nu(\alpha)$):
$\boxed{\text{ES}_\alpha = -\mu + \frac{\sigma}{\alpha} \cdot \frac{\nu + [t_\nu(\alpha)]^2}{\nu - 1} \cdot f_\nu(t_\nu(\alpha))}$
Part 3: ES-VaR Separation
The gap is:
$\text{ES}_\alpha - \text{VaR}_\alpha = \frac{\sigma}{\alpha} \cdot \frac{\nu + [t_\nu(\alpha)]^2}{\nu - 1} \cdot f_\nu(t_\nu(\alpha)) + \sigma \, t_\nu(\alpha)$
Defining $q = t_\nu(\alpha)$:
$\text{ES}_\alpha - \text{VaR}_\alpha = \sigma \left[\frac{1}{\alpha} \cdot \frac{\nu + q^2}{\nu - 1} \cdot f_\nu(q) + q\right]$
The dependence on $\nu$ enters through three channels:
- The quantile $q = t_\nu(\alpha)$: As $\nu$ decreases, the quantile moves further into the left tail (more negative), making VaR larger.
- The density $f_\nu(q)$: The $t$-density has heavier tails for smaller $\nu$, so $f_\nu(q)$ is larger at extreme quantiles.
- The factor $\frac{\nu + q^2}{\nu - 1}$: This is the tail-amplification factor. The $\frac{1}{\nu - 1}$ term diverges as $\nu \to 1^+$, not at $\nu = 2$ -- and $\nu = 1$ is precisely the boundary where the mean (hence ES itself) ceases to exist. As $\nu \to 2^+$ (the variance boundary, but mean and ES are still finite), the gap does NOT blow up: it converges to a finite limit. In standardized form ($\mu = 0$, $\sigma = 1$) that limit is $\frac{1}{\sqrt{2\alpha(1-\alpha)}}$ -- e.g. $\approx 7.107$ at $\alpha = 0.01$ and $\approx 3.244$ at $\alpha = 0.05$. So heavier tails widen the gap monotonically, but the widening is bounded all the way down to $\nu = 2$, and only becomes unbounded as $\nu \to 1^+$.
Limiting behavior:
- $\nu \to \infty$ (Gaussian limit): $t_\nu(\alpha) \to z_\alpha$ (standard Normal quantile), $f_\nu(q) \to \phi(z_\alpha)$, and $\frac{\nu + q^2}{\nu - 1} \to 1$. So ES reduces to the Gaussian formula: $\text{ES}_\alpha = -\mu + \frac{\sigma}{\alpha}\phi(z_\alpha)$.
- Small $\nu$ (heavy tails): The ratio $\text{ES}_\alpha / \text{VaR}_\alpha$ grows significantly. For example, at $\alpha = 0.01$, the Gaussian ES/VaR ratio is about 1.11, but with $\nu = 4$ it is roughly 1.35, and with $\nu = 3$ it is about 1.55. The tail conditional expectation is much worse than the threshold.
Practical Interpretation: This is why regulators moved from VaR to ES (Basel III to Basel III.1/FRTB). VaR tells you the door to the tail; ES tells you what is behind it. Under fat-tailed models, VaR severely understates the true risk in the tail. The $\frac{\nu + q^2}{\nu - 1}$ factor is the precise mechanism: it measures how much worse the average tail loss is compared to the threshold loss.
Answer: The one-day VaR is $\text{VaR}_\alpha = -\mu - \sigma\, t_\nu(\alpha)$. The one-day ES is $\text{ES}_\alpha = -\mu + \frac{\sigma}{\alpha} \cdot \frac{\nu + [t_\nu(\alpha)]^2}{\nu - 1} \cdot f_\nu(t_\nu(\alpha))$. The ES-VaR gap grows as $\nu$ decreases (heavier tails), driven by the amplification factor $\frac{\nu + q^2}{\nu - 1}$, and collapses to the Gaussian gap as $\nu \to \infty$. The gap stays finite all the way down to the variance boundary $\nu \to 2^+$ (standardized limit $\frac{1}{\sqrt{2\alpha(1-\alpha)}}$, e.g. $\approx 7.107$ at $\alpha = 0.01$) and diverges only as $\nu \to 1^+$, where the mean and ES cease to exist.
Intuition
The core lesson here is that VaR and ES measure fundamentally different things about the tail, and the gap between them is a direct measure of tail heaviness. Under a Gaussian model, the tail dies exponentially fast, so knowing the 1% threshold tells you almost everything about the 1% conditional expectation -- they are close together. Under Student-$t$, the tail decays polynomially, so the conditional expectation in the tail can be dramatically worse than the threshold. The factor $\frac{\nu + q^2}{\nu - 1}$ quantifies exactly this: it is the tail amplification that comes from fat tails.
In practice, this is why model choice matters enormously for risk. A desk that quotes VaR under Gaussian assumptions and a desk that quotes VaR under $t_4$ assumptions might report similar-looking VaR numbers (since quantiles do not differ that much), but their ES numbers will diverge sharply. The ES under $t_4$ is about 35% above VaR at the 1% level, versus only 11% for the Gaussian. This is the quantitative argument for why the industry moved from VaR to ES as the primary regulatory risk measure -- ES actually penalizes the shape of the tail, not just the location of a single quantile.