OLS vs. Total Least Squares: When to Minimize Perpendicular Distance

Regression · Medium · Free problem

In linear regression, we typically minimize the sum of squared vertical distances between the data points and the fitted line. This is ordinary least squares (OLS).

But there is an alternative: minimize the sum of squared perpendicular distances from each point to the line. This is called Total Least Squares (TLS), also known as orthogonal regression.

  1. Under what circumstances would you prefer to minimize perpendicular distance rather than vertical distance?
  1. What happens to the OLS estimate $\hat{\beta}$ when the predictor $X$ is measured with noise? Specifically, if the true predictor is $X^{*}$ but you observe $X = X^{*} + \eta$ where $\eta \sim N(0, \sigma_{\eta}^2)$, what does OLS converge to?
  1. How does TLS fix this problem, and what assumptions does it require?

Hints

  1. Think about what OLS implicitly assumes about the measurement of $X$. What happens to that assumption when $X$ is observed with noise?
  2. When you regress $Y$ on a noisy version of $X$, the noise in $X$ acts like adding variance to the denominator of the slope formula. What does that do to $\hat{\beta}$?
  3. The TLS solution can be found by computing the SVD of the centered data matrix $[X - \bar{X}, \, Y - \bar{Y}]$. The line of best fit aligns with the first right singular vector.

Worked Solution

How to Think About It: The core question is about what your regression is really doing when your inputs are noisy. Most people learn OLS and internalize "minimize vertical distance" without ever asking why vertical. The answer is simple: OLS assumes $X$ is known exactly and all the randomness lives in $Y$. That is a fine assumption when $X$ is something you control (like an experimental treatment dose), but it falls apart when $X$ is itself a noisy measurement -- think of estimating the relationship between two financial variables that are both estimated with error, like realized volatility vs. implied volatility.

When both variables have measurement error, vertical distance is the wrong geometric objective. You want something symmetric -- and perpendicular distance is exactly that.

Key Insight: OLS treats $X$ as truth and absorbs all noise into the $Y$-residual. When $X$ is noisy, this causes attenuation bias -- the slope estimate is systematically pulled toward zero. TLS treats both variables symmetrically and corrects for this.

The Method:

  1. The errors-in-variables setup: Suppose the true relationship is $Y^{*} = \beta_0 + \beta_1 X^{*} + \varepsilon$, but you observe $X = X^{*} + \eta$ and $Y = Y^{*} + \delta$ (or just $Y = Y^{*}$ in the simpler case). Here $\eta$ is measurement error in $X$.
  1. What OLS gives you: When you regress $Y$ on the noisy $X$, the OLS estimator converges to:

$\hat{\beta}_{OLS} \xrightarrow{p} \beta_1 \cdot \frac{\sigma_{X^{*}}^2}{\sigma_{X^{*}}^2 + \sigma_{\eta}^2}$

The fraction $\sigma_{X^{*}}^2 / (\sigma_{X^{*}}^2 + \sigma_{\eta}^2)$ is always less than 1, so the slope is biased toward zero. This is called attenuation bias. The noisier $X$ is relative to its true signal, the worse the bias. In the extreme where $\sigma_{\eta}^2 \gg \sigma_{X^{*}}^2$, the slope goes to zero -- you have regressed on pure noise.

  1. How TLS works: TLS minimizes the sum of squared perpendicular distances from each point $(X_i, Y_i)$ to the fitted line. Geometrically, this means allowing errors in both the $X$- and $Y$-directions. The solution can be found via SVD: center the data, form the $n \times 2$ matrix $[X - \bar{X}, \, Y - \bar{Y}]$, and take its SVD. The direction of the first right singular vector gives the line of best fit. Equivalently, the TLS slope equals the eigenvector corresponding to the smaller eigenvalue of the
\times 2$ covariance matrix of $(X, Y)$.

4. When to use TLS over OLS: - Both $X$ and $Y$ are measured with error of comparable magnitude - Neither variable is clearly the "independent" variable -- you are estimating a functional relationship, not building a predictive model - The goal is to recover the true slope $\beta_1$, not to minimize prediction error in $Y$ - Examples: calibrating two instruments against each other, fitting a physical law where both quantities are measured, comparing two noisy financial estimates

5. When OLS is still correct: - $X$ is controlled or known exactly (experimental design) - You only care about predicting $Y$ given $X$ (even if $X$ is noisy, OLS is the right conditional expectation estimator for prediction) - The noise in $X$ is negligible relative to $\sigma_{X^{*}}^2$

Practical Considerations: TLS requires you to know (or estimate) the ratio of error variances $\sigma_{\delta}^2 / \sigma_{\eta}^2$. When you assume equal error variances, you get standard orthogonal regression. When the ratio is known but not 1, you get Deming regression. If you do not know the ratio at all, the problem is not identified without additional assumptions. In finance, this matters when you are fitting relationships between noisy estimates -- naive OLS will systematically understate sensitivities.

Answer: Use TLS (perpendicular distance) instead of OLS (vertical distance) when both variables are measured with error and you want to recover the true functional relationship. OLS in the presence of errors-in-variables produces attenuation bias: $\hat{\beta}_{OLS} \to \beta_1 \cdot \sigma_{X^{*}}^2 / (\sigma_{X^{*}}^2 + \sigma_{\eta}^2)$, systematically shrinking the slope toward zero. TLS corrects this by treating both variables symmetrically, at the cost of requiring knowledge of the error variance ratio.

Intuition

The deep lesson here is that your choice of loss function encodes assumptions about where the noise lives. OLS says "all the noise is in $Y$," so it minimizes vertical residuals. TLS says "noise is in both variables," so it minimizes perpendicular residuals. Neither is universally correct -- the right choice depends on your data-generating process. In quant finance, this matters more than people realize. When you regress one noisy estimate against another (say, realized vol on implied vol, or estimated betas across time), naive OLS will understate the true sensitivity because of attenuation bias. This is one of the most common silent errors in empirical finance.

The broader principle: whenever a standard method gives you a biased or inconsistent answer, ask yourself what implicit assumption is being violated. In this case, OLS assumes perfect measurement of $X$. Once you identify the violated assumption, the fix (TLS, instrumental variables, or something else) usually becomes clear.

Open the full interactive solver →