[Math] Why get the sum of squares instead of the sum of absolute values

regressionstatistics

I'm self-studying machine learning and getting into the basics of linear regression models. From what I understand so far, a good regression model minimizes the sum of the squared differences between predicted values $h(x)$ and actual values $y$.

Something like the following:

$$\sum_{i=1}^m (h(x_i)-y_i)^2$$

Why do we square the differences? On one hand, it seems squaring them will allow us to get a positive number when the expected value is less than the actual value.
But why can't this just be accounted for by taking the sum of the absolute values?

Like so:

$$\sum_{i=1}^m |h(x_i)-y_i|$$

Best Answer

Actually there are some great reasons which have nothing to do with whether this is easy to calculate. The first form is called least squares, and in a probabilistic setting there are several good theoretical justifications to use it. For example, if you assume you are performing this regression on variables with normally distributed error (which is a reasonable assumption in many cases), then the least squares form is the maximum likelihood estimator. There are several other important properties.

You can read some more here.

Related Question