$\begin{align}
S_{xx} &= \sum_i (x_i - \overline x)^2 \\
&= \sum_i (x_i^2 - 2\overline x x_i + \overline x^2) \\
&= \sum_i x_i ^2 - 2\overline x \sum_i x_i + \sum_i \overline x^2 \\
&= \sum_i x_i ^2 - 2\overline x \sum_i x_i + n \overline x^2
\end{align}
$
since $\overline x$ is a constant wrt $i$. Now we note that $\overline x = \frac{\sum_i x_i}{n} \Rightarrow \sum_i x_i = n\overline x$. So
$\begin{align}
S_{xx} &= \sum_i x_i ^2 - 2n\overline x^2 + n x^2 \\
&= \sum_i x_i^2 - n\overline x^2
\end{align}
$
Thus endeth the required algebra.
As for the next part of the question, MSEs can be calculated for any estimator. An estimator is a special kind of random variable.
This is difficult to explain in words, but basically: In your regression problem, you have the random variables $\{Y_i\}_{i=0}^n$, which you're trying to estimate by the estimators $\{\hat Y_i\}_i$. The observed values of $\{Y_i\}_i$ are $\{y_i\}_i$, and the observed values of $\{\hat Y_i\}_i$ are $\{\hat y_i\}_i$. The observed values of an estimator are also called estimates.
Now, since the $\{\hat Y_i\}_i$ are estimators, you can calculate their MSEs. This is what your text does. A section of the Wikipedia article does the same.
Now, forget about regression. Suppose you just have a vector of observed values $\{x_i\}_i$. If you get these values by sampling from an infinite population, your $\{X_i\}_i$ are also random variables. But we're generally not interested in the behaviour of these variables on their own. We're more concerned with things like $\overline X$ (the sample mean) or $S^2_X$ (the sample variance) and so on.
Now, $\overline X$ is also an estimator: It estimates the population mean $\mu$. So you can also calculate an MSE for $\overline X$. This is done in a different section of the same Wikipedia article, and I'm guessing this is what you found odd.
(If sampling from an infinite population sounds weird, consider it as sampling from a normal distribution or some other distribution. The "population" is basically all the points under the curve, and thus is infinitely large.)
$$Y_i=B_0+B_1X_i+\epsilon_i$$
$$\hat{Y_i}=\hat{B_0}+\hat{B_1}X_i$$
a) $$E[MSE]=E[\frac{\sum(Y_i-\hat{Y_i})^2}{n-2}]=\sigma^2=0.6^2$$
$$E[MSR]=E[\sum(\hat{Y_i}-\overline{Y})^2]=\sigma^2+B_1\sum(X_i-\overline{X})^2=1026.36$$
b)
$$\sigma(\hat{B_1})=\sqrt{\frac{\sigma^2}{\sum(X_i-\overline{X})^2}}=\frac{0.6}{\sqrt{\sum(X_i-\overline{X})^2}}$$
for the case where $X=(1,4,10,11,14)$ we have that $\sigma(\hat{B_1})=0.05619515$
and for the case where $X=(6,7,8,9,10)$ $\sigma(B_1)=0.1897367$, then the first set is better I think.
But why I need to look $\sigma(\hat{B_1})$?
Is there any difference if it were estimating the mean response for X = 8?
Best Answer
Assuming that the slide is talking about linear regression with one input variable, i.e. $$y_i = \beta_0 + \beta_1 x_i + \varepsilon_i$$, the correct formula for MSE is: $$ \operatorname{MSE} = \frac{1}{n-2} \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 \ . $$ To reiterate, for the specific case of a linear model with only one input variable the denominator must be $n-2$.
In the more general case when you have a linear model with $k$ input variables that is: $$ y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \dots + \beta_k x_{ki} + \varepsilon_i \ , $$ then the MSE would be: $$ \operatorname{MSE} = \frac{1}{n-(k+1)} \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 \ . $$
I am not aware of any model in which the denominator would be $n$. Usually, the denominator of $n$ is only possible when we know the population parameters $\beta_j$, in which case we are computing the true residual variance not estimating the residual variance.