OLS vs. Total Least Squares: When to Minimize Perpendicular Distance
In linear regression, we typically minimize the sum of squared vertical distances between the data points and the fitted line. This is ordinary least squares (OLS).
But there is an alternative: minimize the sum of squared perpendicular distances from each point to the line. This is called Total Least Squares (TLS), also known as orthogonal regression.
- Under what circumstances would you prefer to minimize perpendicular distance rather than vertical distance?
- What happens to the OLS estimate $\hat{\beta}$ when the predictor $X$ is measured with noise? Specifically, if the true predictor is $X^{*}$ but you observe $X = X^{*} + \eta$ where $\eta \sim N(0, \sigma_{\eta}^2)$, what does OLS converge to?
- How does TLS fix this problem, and what assumptions does it require?
Hints
- Think about what OLS implicitly assumes about the measurement of $X$. What happens to that assumption when $X$ is observed with noise?
- When you regress $Y$ on a noisy version of $X$, the noise in $X$ acts like adding variance to the denominator of the slope formula. What does that do to $\hat{\beta}$?
- The TLS solution can be found by computing the SVD of the centered data matrix $[X - \bar{X}, \, Y - \bar{Y}]$. The line of best fit aligns with the first right singular vector.
Worked Solution
How to Think About It: The core question is about what your regression is really doing when your inputs are noisy. Most people learn OLS and internalize "minimize vertical distance" without ever asking why vertical. The answer is simple: OLS assumes $X$ is known exactly and all the randomness lives in $Y$. That is a fine assumption when $X$ is something you control (like an experimental treatment dose), but it falls apart when $X$ is itself a noisy measurement -- think of estimating the relationship between two financial variables that are both estimated with error, like realized volatility vs. implied volatility.
When both variables have measurement error, vertical distance is the wrong geometric objective. You want something symmetric -- and perpendicular distance is exactly that.
Key Insight: OLS treats $X$ as truth and absorbs all noise into the $Y$-residual. When $X$ is noisy, this causes attenuation bias -- the slope estimate is systematically pulled toward zero. TLS treats both variables symmetrically and corrects for this.
The Method:
- The errors-in-variables setup: Suppose the true relationship is $Y^{*} = \beta_0 + \beta_1 X^{*} + \varepsilon$, but you observe $X = X^{*} + \eta$ and $Y = Y^{*} + \delta$ (or just $Y = Y^{*}$ in the simpler case). Here $\eta$ is measurement error in $X$.
- What OLS gives you: When you regress $Y$ on the noisy $X$, the OLS estimator converges to:
$\hat{\beta}_{OLS} \xrightarrow{p} \beta_1 \cdot \frac{\sigma_{X^{*}}^2}{\sigma_{X^{*}}^2 + \sigma_{\eta}^2}$
The fraction $\sigma_{X^{*}}^2 / (\sigma_{X^{*}}^2 + \sigma_{\eta}^2)$ is always less than 1, so the slope is biased toward zero. This is called attenuation bias. The noisier $X$ is relative to its true signal, the worse the bias. In the extreme where $\sigma_{\eta}^2 \gg \sigma_{X^{*}}^2$, the slope goes to zero -- you have regressed on pure noise.
- How TLS works: TLS minimizes the sum of squared perpendicular distances from each point $(X_i, Y_i)$ to the fitted line. Geometrically, this means allowing errors in both the $X$- and $Y$-directions. The solution can be found via SVD: center the data, form the $n \times 2$ matrix $[X - \bar{X}, \, Y - \bar{Y}]$, and take its SVD. The direction of the first right singular vector gives the line of best fit. Equivalently, the TLS slope equals the eigenvector corresponding to the smaller eigenvalue of the