Linear Interpolation and Extrapolation of House Prices

Statistics · Easy · Free problem

You are given a list of houses, each with an area $x$ and a known price $y$, sorted by area. You must estimate the price of a query house of area $x$.

  • If $x$ falls between two known areas $x_1 < x < x_2$, interpolate linearly between their prices.
  • If $x$ falls outside the range of known areas, extrapolate linearly using the two nearest known points on that side.

Give the formula and explain why this is a reasonable estimator (and where it can go wrong).

Hints

  1. Interpolation and extrapolation are the same straight-line formula -- the only question is which two reference points you draw the line through.
  2. Compute the slope between two points and evaluate the line at the query $x$: $y = y_1 + (x - x_1)\cdot\text{slope}$.
  3. For an out-of-range query, pick the two nearest known points on that side and extend the line past them; binary-search the sorted areas to locate the bracket.

Worked Solution

How to Think About It: Both interpolation and extrapolation use the same straight-line formula between two reference points; the only difference is which two points you pick. For interpolation, bracket the query. For extrapolation, use the two closest points on the relevant edge and extend the line beyond them.

Key Insight: The line through two points $(x_1, y_1)$ and $(x_2, y_2)$ has slope $(y_2 - y_1)/(x_2 - x_1)$, and evaluating it at $x$ gives the estimate -- the same expression whether $x$ is inside or outside $[x_1, x_2]$.

The Method: 1. Binary-search the sorted areas for the position of $x$. 2. If $x$ lies between two known areas, let those be $(x_1, y_1)$ and $(x_2, y_2)$. 3. If $x$ is below the smallest area, use the two smallest points; if above the largest, use the two largest points. 4. Evaluate the line:

$y = y_1 + (x - x_1)\,\frac{y_2 - y_1}{x_2 - x_1}.$

Practical Considerations: Interpolation between observed points is usually safe. Extrapolation is dangerous: it assumes the local linear trend continues, which often fails at the extremes (a mansion is not priced by extending the slope of two small houses). Guard against division by zero when two reference areas coincide, and beware that a steep slope at the boundary can produce absurd or negative prices when extended far out.

Answer: Use $y = y_1 + (x - x_1)(y_2 - y_1)/(x_2 - x_1)$ with the bracketing pair for interior queries and the two nearest edge points for out-of-range queries; treat extrapolated estimates with suspicion.

Intuition

This problem looks trivial but is really about the difference between interpolating (safe) and extrapolating (risky) a model. The single linear formula handles both, which is the clean takeaway, but the judgment being tested is knowing that extending a local slope beyond your data is an assumption, not a measurement. In quant work this is exactly the danger of using a curve or surface outside its calibration range -- yield curves, vol surfaces, and pricing models all misbehave when you extrapolate past where you have data.

Open the full interactive solver →