In a neural network class I'm taking the error measure is defined as:
If the purpose of squaring the difference of the predicted and target values is to always have a positive value, then why not just use the absolute value of the difference instead?
Secondly, why is the summation halved?
Best Answer
The two are not equivalent (minimizing one or the other may not give you the same optimal model or model parameters), but both are used. They are sometimes referred to as the L1 and L2 loss functions for the absolute value and squared versions, respectively.
The coefficient in front of the summation is immaterial. The reason why it's one half is a convention to make the math come out nicer: if $f(x) = \frac{1}{2}x^2$, then $f'(x) = x$.