Fitting a Hyperplane to Noisy Points

Linear Algebra · Medium · Free problem

You observe $n$ points in $\mathbb{R}^d$. They were generated by sampling points that lie on a single unknown hyperplane and then adding independent random noise.

  1. Write the general equation of a hyperplane in $\mathbb{R}^d$.
  1. Construct a loss function whose minimizer estimates that hyperplane.
  1. What is the solution?

Hints

  1. There is no special response variable here -- noise is in every coordinate, so this is total least squares, not ordinary regression.
  2. Use the orthogonal (perpendicular) distance to the plane, $|w^{\top}x_i - b|$, and constrain $\lVert w\rVert = 1$.
  3. After centering, you are minimizing $w^{\top}Sw$ over unit vectors -- that is the smallest-eigenvalue eigenvector of the scatter matrix.

Worked Solution

How to Think About It: A hyperplane is defined by a normal direction and an offset. 'Fitting' it means finding the direction along which the points vary least -- the noise direction -- because the true points have zero spread perpendicular to the plane. So this is a total-least-squares / PCA problem, not an ordinary regression (there is no privileged 'response' coordinate; noise is in all directions).

Part 1 -- Hyperplane equation: A hyperplane in $\mathbb{R}^d$ is $\{x : w^{\top}x = b\}$ with unit normal $w$ ($\lVert w \rVert = 1$) and offset $b$. Equivalently $w^{\top}(x - x_0) = 0$ for any point $x_0$ on the plane.

Part 2 -- Loss function: The perpendicular distance from point $x_i$ to the plane is $|w^{\top}x_i - b|$. Minimize the sum of squared orthogonal distances: $L(w, b) = \sum_{i=1}^n (w^{\top}x_i - b)^2 \quad \text{subject to } \lVert w \rVert = 1.$ The unit-norm constraint is essential -- without it the trivial $w = 0$ wins.

Part 3 -- Solution (PCA): Minimizing over $b$ first gives $b = w^{\top}\bar{x}$, where $\bar{x}$ is the sample mean. Substituting, $L(w) = \sum_i \big(w^{\top}(x_i - \bar x)\big)^2 = w^{\top} S\, w$, where $S = \sum_i (x_i-\bar x)(x_i-\bar x)^{\top}$ is the (unnormalized) covariance/scatter matrix. Minimizing $w^{\top}Sw$ over unit $w$ is solved by the eigenvector of $S$ with the SMALLEST eigenvalue. That eigenvector is the estimated normal $\hat w$; the offset is $\hat b = \hat w^{\top}\bar x$.

Answer: The plane is $w^{\top}x = b$, $\lVert w\rVert=1$. Minimize the sum of squared orthogonal distances $\sum_i(w^{\top}x_i - b)^2$. The estimate $\hat w$ is the smallest-eigenvalue eigenvector of the centered scatter matrix, and $\hat b = \hat w^{\top}\bar x$ -- this is PCA / total least squares.

Intuition

Fitting a hyperplane to symmetric noise is the canonical PCA story: the plane's normal is the direction of least variance, and you recover it as the bottom eigenvector of the covariance matrix. This differs sharply from OLS, which minimizes vertical residuals and assumes noise only in $y$; here, because noise hits every coordinate, orthogonal distance is the right loss and total least squares is the right tool. Confusing the two -- using OLS when noise is isotropic -- gives biased, direction-dependent fits, a mistake quants make whenever both variables in a relationship are measured with error.

Open the full interactive solver →