Solved – Why there is square in MSE (mean squared error)

loss-functionsmachine learning

Please forgive me for such a beginner question, since I'm learning stats . & machine learning.

I'm trying to understand Mean Squared Error.

I understand the "Mean Error", the Mean of Errors between real and predicted values, what worries me is why we take square of errors?

If it's just to keep the values positive then why don't we only take absolute values.

I just want to understand what values does it bring to the actual loss function.

Thanks

Best Answer

MSE has some desirable properties such as easier differentiability (as @user2974951 comments) for further analysis. Differentiability of objective function is in general very important to perform analytical calculations. Taking absolute values is called Mean Absolute Error (MAE in short). It also has applications. It's not like we always prefer MSE or MAE. Another reason, might be penalising large errors more, because if your error is large, its square is much larger. For example, if some error term, $e_i$ is 999, and the other, $e_j$, is $50$; and if we are to choose which term to decrease by an amount of $1$, MAE can choose any of them. But, MSE aims at the larger one since the square decrease is higher.