Machine Learning – Understanding ‘The Mean Minimizes the Mean Squared Error’

machine learning

I am trying to understand the sentence

the mean minimizes the mean squared error.

from wikipedia https://en.wikipedia.org/wiki/Average_absolute_deviation.
From a previous post Formal proof that mean minimize squared error function I can see a formal proof which according to the authors prove that the mean minimaze. But my objection on that prove is that is you substitute m with median all the reasoning still stands.

Therefore, I restricted myself to one dimension and built the following example :

MSE = 1/n[SUM(Y_hat_i - Y_i)**2]

Given the following distribution Y = [5,3,2,7,4], minimizing MSE I intended as in the following: I should find a function Y_hat such that when it is applied to the MSE formula (more specifically to each element of the Y distribution), we are sure that we obtain the minimum value. For simplicity, I could consider the Y_hat as any measure or function for the central tendency, including, mean, median, mode, etc.

Lets suppose, Y_hat = mean (=4.2 for our distribution), therefore, I did all the calculations such as:

1/5 [(4.2-5)**2 + (4.2-3)**2+ (4.2-2)**2 + (4.2-7)**2  + (4.2-4)**2 ] = 2.96

Let's suppose, Y_hat = median (=4 for our distribution), therefore, I did all the calculations such as:

1/5 [(4-5)**2 + (4-3)**2+ (4-2)**2 + (4-7)**2  + (4-4)**2 ] = 2.8

If I did all the calculation correctly, the result shows that for this case, the median is minimizing better than the mean.

I am sure, I am doing something wrong and missing important aspects of the overall reasoning.

Please, could you provide any clarification on the above topic? I would really appreciate it.

Many Thanks in advance, Best Regards, Carlo

Best Answer

If you have $(y_i)_{i=1}^n$, consider the mean squared difference from the $y_i$ to a value $a$.

This is $s(a) =\sum_{i=1}^n (y_i-a)^2 $.

Manipulating this,

$\begin{array}\\ s(a) &=\sum_{i=1}^n (y_i-a)^2\\ &=\sum_{i=1}^n (y_i^2-2ay_i+a^2)\\ &=\sum_{i=1}^n y_i^2-\sum_{i=1}^n2ay_i+\sum_{i=1}^na^2\\ &=\sum_{i=1}^n y_i^2-2a\sum_{i=1}^ny_i+na^2\\ \end{array} $

There are a number of ways to minimize this expression. Perhaps the easiest is to differentiate with respect to $a$. This gives $s'(a) =-2\sum_{i=1}^ny_i+2na $ and this is zero when $a =\dfrac{\sum_{i=1}^ny_i}{n} $, the mean of the values.

Note that, since $s''(a) =2n > 0 $, this value of $a$ gives a minimum.


(added later)

An even easier way is to write $\bar{y} =\dfrac{\sum_{i=1}^ny_i}{n} $ and $\bar{y^2} =\dfrac{\sum_{i=1}^ny_i^2}{n} $.

Then

$\begin{array}\\ \frac1{n}s(a) &=\frac1{n}\sum_{i=1}^n y_i^2-2a\frac1{n}\sum_{i=1}^ny_i+a^2\\ &=a^2-2a\bar{y}+\bar{y^2}\\ &=a^2-2a\bar{y}+\bar{y}^2-\bar{y}^2+\bar{y^2}\\ &=(a-\bar{y})^2+\bar{y^2}-\bar{y}^2\\ \end{array} $

Since $\bar{y^2}-\bar{y}^2$ is independent of $a$, this is clearly a minimum when $a = \bar{y}$ and the value at the minimum is $\bar{y^2}-\bar{y}^2$.