Why Vertical Distance is Standard in Least Squares – Linear Algebra

geometryintuitionleast squareslinear algebrastatistics

When fitting a curve in $\mathbb R^2$ to data points in $\mathbb R^2$ (example), why is each point's vertical distance from the curve squared instead of its shortest (possibly diagonal) distance from the curve?

Ignoring my poorly drawn curves, it seems obvious that https://i.imgur.com/vy6ioMg.png is a worse curve-to-point fit than https://i.imgur.com/9m22M8p.png even though the red line is shorter in the first image, because you can draw the much shorter blue line instead (which I labeled b in the second image). Minimizing $b^2$ seems much more important than minimizing $a^2$.

Best Answer

In fact, diagonal distance is used in some cases.

From the operative point of view, the standard "vertical" distance is taken when it is assumed that in the 2D data available, the $x$ has been measured (almost) exactly, while the $y$ is subject to error.

When both $x$ and $y$ are subject to error, under the usual assumptions of independence, near Gaussian distribution, etc. the distance shall be taken along a line with slope $\sigma_y / \sigma_x$.

That's called Total least squares.

Related Question