Solved – Questions on Definitions and Notation (MSE, SSE, Sxx)


It is said that $S_{xx} = \sum_{i=1}^n(x_i−\overline x)^2 = \sum_{i=1}^n x_i^2 −n\overline x^2$.

I suspect this is simple algebra but I am missing something still. How does this work?

Further, Wikipedia mentions that the MSE is $\sum_i \frac{(X_i – \overline X)^2}{n-2}$. However, my text notes that SSE is $\sum_i (Y_i-\hat{Y} )^2$.

However it should be the case that $MSE = \frac{SSE}{n-2}$.

Can Xs and Ys be used interchangeably like this? It seems wrong to me.

Best Answer

$\begin{align} S_{xx} &= \sum_i (x_i - \overline x)^2 \\ &= \sum_i (x_i^2 - 2\overline x x_i + \overline x^2) \\ &= \sum_i x_i ^2 - 2\overline x \sum_i x_i + \sum_i \overline x^2 \\ &= \sum_i x_i ^2 - 2\overline x \sum_i x_i + n \overline x^2 \end{align} $

since $\overline x$ is a constant wrt $i$. Now we note that $\overline x = \frac{\sum_i x_i}{n} \Rightarrow \sum_i x_i = n\overline x$. So

$\begin{align} S_{xx} &= \sum_i x_i ^2 - 2n\overline x^2 + n x^2 \\ &= \sum_i x_i^2 - n\overline x^2 \end{align} $

Thus endeth the required algebra.

As for the next part of the question, MSEs can be calculated for any estimator. An estimator is a special kind of random variable.

This is difficult to explain in words, but basically: In your regression problem, you have the random variables $\{Y_i\}_{i=0}^n$, which you're trying to estimate by the estimators $\{\hat Y_i\}_i$. The observed values of $\{Y_i\}_i$ are $\{y_i\}_i$, and the observed values of $\{\hat Y_i\}_i$ are $\{\hat y_i\}_i$. The observed values of an estimator are also called estimates.

Now, since the $\{\hat Y_i\}_i$ are estimators, you can calculate their MSEs. This is what your text does. A section of the Wikipedia article does the same.

Now, forget about regression. Suppose you just have a vector of observed values $\{x_i\}_i$. If you get these values by sampling from an infinite population, your $\{X_i\}_i$ are also random variables. But we're generally not interested in the behaviour of these variables on their own. We're more concerned with things like $\overline X$ (the sample mean) or $S^2_X$ (the sample variance) and so on.

Now, $\overline X$ is also an estimator: It estimates the population mean $\mu$. So you can also calculate an MSE for $\overline X$. This is done in a different section of the same Wikipedia article, and I'm guessing this is what you found odd.

(If sampling from an infinite population sounds weird, consider it as sampling from a normal distribution or some other distribution. The "population" is basically all the points under the curve, and thus is infinitely large.)