Machine Learning – Why is the Sum of Error Measures Halved Instead of Using Absolute Values?

machine learningneural networks

In a neural network class I'm taking the error measure is defined as:

enter image description here

If the purpose of squaring the difference of the predicted and target values is to always have a positive value, then why not just use the absolute value of the difference instead?

Secondly, why is the summation halved?

Best Answer

  1. The two are not equivalent (minimizing one or the other may not give you the same optimal model or model parameters), but both are used. They are sometimes referred to as the L1 and L2 loss functions for the absolute value and squared versions, respectively.

  2. The coefficient in front of the summation is immaterial. The reason why it's one half is a convention to make the math come out nicer: if $f(x) = \frac{1}{2}x^2$, then $f'(x) = x$.