Why do we use sample mean in the total sum of squares (SST) calculation

linear algebraregression

I am trying to understand why the sample mean is used in the Total Sum of Squares or Total Deviation calculation.

In other words, what is so important or significant about the sample mean?

A lot of books and notes gloss over this information and it is frustrating me to no end.

Representation

Best Answer

Linear regression seeks to find the expected value (so the mean value) of $y$, given some specified features.

If we had no knowledge of the relationship between the features and $y$, a sensible guess is the overall mean of $y$.

The total sum of squares also winds up being related to the overall variance of $y$ when TSS is calculated this way, too.