When we find a Best fit line for a data set why do we minimize Error rather than minimizing distance from the line

data analysisstatistics

When we draw the best fit line through a set of data points we are essentially minimizing the sum of errors between the actual points and the line. Why do we not minimize the distance from the line to the points rather than the error? In another way of explaining this to make sure that I'm being clear, why do we minimize the verticle line length between the point and the line and not minimize a perpendicular line from the best fit line to the points?

Best Answer

A first answer is that usually, the units of one variable are not the units in the other variabl. For example when you have pairs $(size_k, weight_k), k=1\cdots n$, among a certain human population sample represented as points, depending on the units you take on each axis, orthogonal projection of the points on the best fit line for a certain choice of units will not fall at the same place if you change the units say on one of the axes.