Prediction Intervals vs. Confidence Intervals in Regression

Statistics · Medium · Free problem

In linear regression, you often see two kinds of intervals around a fitted value: a confidence interval and a prediction interval.

  1. What is each one estimating, and why does it matter which one you use?
  1. Write out the formulas for both intervals at a new point $x_0$. Explain each term and why the prediction interval is always wider.
  1. A colleague builds a regression model and reports the 95% confidence interval for $\hat{y}$ at a particular $x_0$. They then claim: "There is a 95% chance the next observation at $x_0$ falls in this interval." Is this correct? Explain.

Hints

  1. One interval targets a fixed unknown quantity (the true mean), the other targets a random future observation. What extra source of variability does the second one carry?
  2. Compare the variance formulas: $\text{Var}(\hat{y}_0)$ vs. $\text{Var}(Y_0 - \hat{y}_0)$. Where does the extra $\sigma^2$ term come from?
  3. Think about what happens as $n \to \infty$. One interval shrinks to zero width, the other does not. Which is which, and why?

Worked Solution

How to Think About It: The core distinction is deceptively simple: a confidence interval targets the *mean* response $E[Y \mid x_0]$, while a prediction interval targets a *single future observation* $Y_0$ at $x_0$. The mean response is a fixed (unknown) quantity -- your uncertainty about it comes only from estimating the regression coefficients. A future observation, on the other hand, has two sources of uncertainty: you don't know the true mean *and* the observation itself has irreducible noise $\varepsilon$ around that mean. This is why prediction intervals are always wider. In practice, if someone asks "where will the next data point land?" they want a prediction interval. If they ask "what is the true average response?" they want a confidence interval.

Key Insight: The prediction interval includes the residual variance $\sigma^2$ as an additive term under the square root, which the confidence interval does not. No amount of data can shrink that term away -- it reflects the inherent noise in individual observations.

The Method:

Consider a standard linear regression $Y = X\beta + \varepsilon$, where $\varepsilon \sim N(0, \sigma^2 I)$. At a new point $x_0$, the fitted value is $\hat{y}_0 = x_0^T \hat{\beta}$.

Confidence interval for the mean response $E[Y \mid x_0] = x_0^T \beta$:

$\hat{y}_0 \pm t_{\alpha/2,\, n-p} \cdot \hat{\sigma} \sqrt{x_0^T (X^T X)^{-1} x_0}$

The term $x_0^T (X^T X)^{-1} x_0$ captures how much uncertainty you have in $\hat{\beta}$ in the direction of $x_0$. Points far from the center of the training data have larger leverage, so the confidence interval fans out.

Prediction interval for a new observation $Y_0 = x_0^T \beta + \varepsilon_0$:

$\hat{y}_0 \pm t_{\alpha/2,\, n-p} \cdot \hat{\sigma} \sqrt{1 + x_0^T (X^T X)^{-1} x_0}$

The only difference is the $+1$ under the square root. That

$ comes from $\text{Var}(\varepsilon_0) = \sigma^2$ -- the irreducible noise in the individual observation. Even if you had infinite data and knew $\beta$ perfectly, the prediction interval would still have width proportional to $\sigma$.

Why this matters:

  • As $n \to \infty$, the confidence interval shrinks to zero (you pin down the mean exactly), but the prediction interval converges to $\pm t_{\alpha/2} \cdot \sigma$ -- it never vanishes.
  • The colleague's claim in part (3) is wrong. The confidence interval covers the *mean* response with 95% probability, not a single future observation. Using a confidence interval as if it were a prediction interval will badly undercover -- you will be surprised by how many individual observations fall outside it. This is one of the most common mistakes in applied regression.

Answer: A confidence interval bounds the mean response $E[Y \mid x_0]$ and reflects only estimation uncertainty in $\hat{\beta}$. A prediction interval bounds a future individual observation and adds the irreducible noise variance $\sigma^2$. The prediction interval is always wider, and it does not shrink to zero with more data. The colleague's interpretation conflates the two: a confidence interval for the mean cannot be used to predict where a single new observation will fall.

Intuition

The prediction vs. confidence interval distinction boils down to a fundamental question: are you uncertain about a parameter or about a random variable? The mean response $E[Y \mid x_0]$ is a parameter -- it is fixed, and you can pin it down with enough data. A future observation $Y_0$ is a random variable -- even if you knew every parameter perfectly, it would still bounce around due to noise. That irreducible randomness is why the prediction interval never collapses, no matter how big your dataset.

This comes up constantly in practice. If a risk model gives you a 95% confidence band around expected P&L and you treat it as a prediction band for actual P&L, you will massively underestimate how often you breach the band. The same trap appears in forecasting, machine learning deployment, and any setting where someone confuses "where is the average?" with "where will the next observation land?" Getting this wrong leads to overconfident predictions -- which in finance means poorly sized positions and blown risk limits.

Open the full interactive solver →