Since
$$\begin{align*}
\hat\beta &= (X^TX)^{-1}X^TY \\
&= (X^TX)^{-1}X^T(X\beta + \varepsilon) \\
&= \beta + (X^TX)^{-1}X^T\varepsilon
\end{align*}$$
we know that
$$\hat\beta-\beta \sim \mathcal{N}(0,\sigma^2 (X^TX)^{-1})$$
and thus we know that for each component $k$ of $\hat\beta$,
$$\hat\beta_k -\beta_k \sim \mathcal{N}(0, \sigma^2 S_{kk})$$
where $S_{kk}$ is the $k^\text{th}$ diagonal element of $(X^TX)^{-1}$.
Thus, we know that
$$z_k = \frac{\hat\beta_k -\beta_k}{\sqrt{\sigma^2 S_{kk}}} \sim \mathcal{N}(0,1).$$
Take note of the statement of the Theorem for the Distribution of an Idempotent Quadratic Form in a Standard Normal Vector (Theorem B.8 in Greene):
If $x\sim\mathcal{N}(0,I)$ and $A$ is symmetric and idempotent, then $x^TAx$ is distributed $\chi^2_{\nu}$ where $\nu$ is the rank of $A$.
Let $\hat\varepsilon$ denote the regression residual vector and let
$$M=I_n - X(X^TX)^{-1}X^T \text{,}$$
which is the residual maker matrix (i.e. $My=\hat\varepsilon$). It's easy to verify that $M$ is symmetric and idempotent.
Let
$$s^2 = \frac{\hat\varepsilon^T \hat\varepsilon}{n-p}$$
be an estimator for $\sigma^2$.
We then need to do some linear algebra. Note these three linear algebra properties:
- The rank of an idempotent matrix is its trace.
- $\operatorname{Tr}(A_1+A_2) = \operatorname{Tr}(A_1) + \operatorname{Tr}(A_2)$
- $\operatorname{Tr}(A_1A_2) = \operatorname{Tr}(A_2A_1)$ if $A_1$ is $n_1 \times n_2$ and $A_2$ is $n_2 \times n_1$ (this property is critical for the below to work)
So
$$\begin{align*}
\operatorname{rank}(M) = \operatorname{Tr}(M) &= \operatorname{Tr}(I_n - X(X^TX)^{-1}X^T) \\
&= \operatorname{Tr}(I_n) - \operatorname{Tr}\left( X(X^TX)^{-1}X^T) \right) \\
&= \operatorname{Tr}(I_n) - \operatorname{Tr}\left( (X^TX)^{-1}X^TX) \right) \\
&= \operatorname{Tr}(I_n) - \operatorname{Tr}(I_p) \\
&=n-p
\end{align*}$$
Then
$$\begin{align*}
V = \frac{(n-p)s^2}{\sigma^2} = \frac{\hat\varepsilon^T\hat\varepsilon}{\sigma^2} = \left(\frac{\varepsilon}{\sigma}\right)^T M \left(\frac{\varepsilon}{\sigma}\right).
\end{align*}$$
Applying the Theorem for the Distribution of an Idempotent Quadratic Form in a Standard Normal Vector (stated above), we know that $V \sim \chi^2_{n-p}$.
Since you assumed that $\varepsilon$ is normally distributed, then $\hat\beta$ is independent of $\hat\varepsilon$, and since $s^2$ is a function of $\hat\varepsilon$, then $s^2$ is also independent of $\hat\beta$. Thus, $z_k$ and $V$ are independent of each other.
Then,
$$\begin{align*}
t_k = \frac{z_k}{\sqrt{V/(n-p)}}
\end{align*}$$
is the ratio of a standard Normal distribution with the square root of a Chi-squared distribution with the same degrees of freedom (i.e. $n-p$), which is a characterization of the $t$ distribution. Therefore, the statistic $t_k$ has a $t$ distribution with $n-p$ degrees of freedom.
It can then be algebraically manipulated into a more familiar form.
$$\begin{align*}
t_k &= \frac{\frac{\hat\beta_k -\beta_k}{\sqrt{\sigma^2 S_{kk}}}}{\sqrt{\frac{(n-p)s^2}{\sigma^2}/(n-p)}} \\
&= \frac{\frac{\hat\beta_k -\beta_k}{\sqrt{S_{kk}}}}{\sqrt{s^2}} = \frac{\hat\beta_k -\beta_k}{\sqrt{s^2 S_{kk}}} \\
&= \frac{\hat\beta_k -\beta_k}{\operatorname{se}\left(\hat\beta_k \right)}
\end{align*}$$
Here are some tips. First, just like variances, covariance can ignore additive constants. You can use this to show:
$$cov (\hat {\beta},\overline {Y})=cov (\hat {\beta},\overline {\epsilon}) $$
Then, you can show that
$$\hat{\beta}=\sum_iw_iY_i $$
Where $w_i=\frac {x_i-\overline {x}}{S_{XX}} $. You can ignore constants as well to get
$$cov (\hat {\beta},\overline {Y})=cov (\sum_iw_i\epsilon_i,\overline {\epsilon}) $$
Then you can apply your result at the start of your question.
Best Answer
Following your notations, we have $$V(\hat{\beta}) = \|\hat{\epsilon}\|^2 = \text{RSS}$$ i.e., the Residual Sum of Squares.
It is a fact that (cf. here) $$\frac{\text{RSS}}{\sigma²} \sim \chi_{(N-p)}^2$$ with $N$ the total sample size and $p$ the number of parameters in $\beta$ (here, $p = n + m$).
The result follows from the fact that the expectation of a chi-square random variable equals its number of degrees of freedom, i.e., $$ \text{E}\left(\frac{\text{RSS}}{\sigma²}\right) = N - p $$ which can be rewritten as $$ \text{E}\left(\frac{\text{RSS}}{N-p}\right) = \sigma² $$ since $N-p$ and $\sigma²$ are both non-random.