I don't work directly through your derivation, but provide a more general formulation below.
For a more general formulation, let your regression model be $Y = X\beta + \epsilon$, $P_X = X(X^\prime X)^{-1} X^\prime$, and $M_X = I_N - P_X$ ($I_N$ is a $N\times N$ identity matrix). $X$ is $N\times K$ and of full column rank. We assume homoskedasticity and no serial correlation.
We show that $\hat{\sigma}^2$ is unbiased:
$$\begin{align*}
\mathbb{E}\left[\frac{\hat{\epsilon}^\prime \hat{\epsilon}}{N - K}\mid X\right] &= \mathbb{E}\left[\frac{\epsilon^\prime M^\prime M \epsilon}{N - K}\mid X\right] \\
&= \mathbb{E}\left[\frac{\epsilon^\prime M \epsilon}{N - K}\mid X\right] \\
&= \frac{\sum_{i=1}^N{\sum_{j=1}^N{m_{ji}\mathbb{E}[\epsilon_i\epsilon_j\mid X]}}}{N - K} \\
&= \frac{\sum_{i=1}^N{m_{ii}\sigma^2}}{N - K} \\
&= \frac{\sigma^2\mathop{\text{tr}}(M)}{N - K} \\
\end{align*}$$
$$\begin{align*}
\text{tr}(M) &= \text{tr}(I_N - P_X) \\
&= \text{tr}(I_N) - \text{tr}(P_X) \\
&= N - \text{tr}\left(X\left(X^\prime X\right)^{-1}X^\prime\right) \\
&= N - \text{tr}\left(\left(X^\prime X\right)^{-1}X^\prime X\right) \\
&= N - \text{tr}(I_{K}) = N - K \\
\Longrightarrow \mathbb{E}\left[\frac{\hat{\epsilon}^\prime \hat{\epsilon}}{N - K}\mid X\right] &= \frac{\sigma^2 (N-K)}{(N-K)} = \sigma^2.
\end{align*}$$
I'll show the result for any multiple linear regression, whether the regressors are polynomials of $X_t$ or not. In fact, it shows a little more than what you asked, because it shows that each LOOCV residual is identical to the corresponding leverage-weighted residual from the full regression, not just that you can obtain the LOOCV error as in (5.2) (there could be other ways in which the averages agree, even if not each term in the average is the same).
Let me take the liberty to use slightly adapted notation.
We first show that
\begin{align*}
\hat\beta-\hat\beta_{(t)}&=\left(\frac{\hat u_t}{1-h_t}\right)(X'X)^{-1}X_t', \quad\quad \textrm{(A)}
\end{align*}
where $\hat\beta$ is the estimate using all data and $\hat\beta_{(t)}$ the estimate when leaving out $X_{(t)}$, observation $t$. Let $X_t$ be defined as a row vector such that $\hat y_t=X_t\hat\beta$. $\hat u_t$ are the residuals.
The proof uses the following matrix algebraic result.
Let $A$ be a nonsingular matrix, $b$ a vector and $\lambda$ a scalar. If
\begin{align*}
\lambda&\neq -\frac{1}{b'A^{-1}b}
\end{align*}Then
\begin{align*}
(A+\lambda bb')^{-1}&=A^{-1}-\left(\frac{\lambda}{1+\lambda b'A^{-1}b}\right)A^{-1}bb'A^{-1}\quad\quad \textrm{(B)
}\end{align*}
The proof of (B) follows immediately from verifying
\begin{align*}
\left\{A^{-1}-\left(\frac{\lambda}{1+\lambda b'A^{-1}b}\right)A^{-1}bb'A^{-1}\right\}(A+\lambda bb')=I.
\end{align*}
The following result is helpful to prove (A)
\begin{align}
(X_{(t)}'X_{(t)})^{-1}X_t'=\left(\frac{1}{1-h_t}\right)(X'X)^{-1}X_t'.\quad\quad \textrm{ (C)}
\end{align}
Proof of (C): By (B) we have, using $\sum_{t=1}^TX_t'X_t=X'X$,
\begin{align*}
(X_{(t)}'X_{(t)})^{-1}&=(X'X-X_t'X_t)^{-1}\\
&=(X'X)^{-1}+\frac{(X'X)^{-1}X_t'X_t(X'X)^{-1}}{1-X_t(X'X)^{-1}X_t'}.
\end{align*}
So we find
\begin{align*}
(X_{(t)}'X_{(t)})^{-1}X_t'&=(X'X)^{-1}X_t'+(X'X)^{-1}X_t'\left(\frac{X_t(X'X)^{-1}X_t'}{1-X_t(X'X)^{-1}X_t'}\right)\\
&=\left(\frac{1}{1-h_t}\right)(X'X)^{-1}X_t'.
\end{align*}
The proof of (A) now follows from (C): As
\begin{align*}
X'X\hat\beta&=X'y,
\end{align*}
we have
\begin{align*}
(X_{(t)}'X_{(t)}+X_t'X_t)\hat\beta &=X_{(t)}'y_{(t)}+X_t' y_t,
\end{align*}
or
\begin{align*}
\left\{I_k+(X_{(t)}'X_{(t)})^{-1}X_t'X_t\right\}\hat\beta&=\hat\beta_{(t)}+(X_{(t)}'X_{(t)})^{-1}X_t'(X_t\hat\beta+\hat u_t).
\end{align*}
So,
\begin{align*}
\hat\beta&=\hat\beta_{(t)}+(X_{(t)}'X_{(t)})^{-1}X_t'\hat u_t\\
&=\hat\beta_{(t)}+(X'X)^{-1}X_t'\frac{\hat u_t}{1-h_t},
\end{align*}
where the last equality follows from (C).
Now, note $h_t=X_t(X'X)^{-1}X_t'$. Multiply through in (A) by $X_t$, add $y_t$ on both sides and rearrange to get, with $\hat u_{(t)}$ the residuals resulting from using $\hat\beta_{(t)}$ ($y_t-X_t\hat\beta_{(t)}$),
$$
\hat u_{(t)}=\hat u_t+\left(\frac{\hat u_t}{1-h_t}\right)h_t
$$
or
$$
\hat u_{(t)}=\frac{\hat u_t(1-h_t)+\hat u_th_t}{1-h_t}=\frac{\hat u_t}{1-h_t}
$$
Best Answer
This is straightforward from the Ordinary Least Squares definition. If there is no intercept, one is minimizing $R(\beta) = \sum_{i=1}^{i=n} (y_i- \beta x_i)^2$. This is smooth as a function of $\beta$, so all minima (or maxima) occur when the derivative is zero. Differentiating with respect to $\beta$ we get $-\sum_{i=1}^{i=n} 2(y_i- \beta x_i)x_i$. Solving for $\beta$ gives the formula.