We can prove this for more general case of $p$ variables by using the "hat matrix" and some of its useful properties. These results are usually much harder to state in non matrix terms because of the use of the spectral decomposition.
Now in matrix version of least squares, the hat matrix is $H=X(X^TX)^{-1}X^T$ where $X$ has $n$ rows and $p+1$ columns (column of ones for $\beta_0$). Assume full column rank for convenience - else you could replace $p+1$ by the column rank of $X$ in the following. We can write the fitted values as $\hat{Y}_i=\sum_{j=1}^nH_{ij}Y_j$ or in matrix notation $\hat{Y}=HY$. Using this, we can write the sum of squares as:
$$\frac{\sum_{i=1}(Y-\hat{Y_i})^2}{\sigma^2}=\frac{(Y-\hat{Y})^T(Y-\hat{Y})}{\sigma^2}=\frac{(Y-HY)^T(Y-HY)}{\sigma^2}$$
$$=\frac{Y^T(I_n-H)Y}{\sigma^2}$$
Where $I_n$ is an identity matrix of order $n$. The last step follows from the fact that $H$ is an idepotent matrix, as $$H^2=[X(X^TX)^{-1}X^T][X(X^TX)^{-1}X^T]=X(X^TX)^{-1}X^T=H=HH^T=H^TH$$
Now a neat property of idepotent matrices is that all of their eigenvalues must be equal to zero or one. Letting $e$ denote a normalised eigenvector of $H$ with eigenvalue $l$, we can prove this as follows:
$$He=le\implies H(He)=H(le)$$
$$LHS=H^2e=He=le\;\;\; RHS=lHe=l^2e$$
$$\implies le=l^2e\implies l=0\text{ or }1$$
(note that $e$ cannot be zero as it must satisfy $e^Te=1$) Now because $H$ is idepotent, $I_n-H$ also is, because
$$(I_n-H)(I_n-H)=I-IH-HI+H^2=I_n-H$$
We also have the property that the sum of the eigenvalues equals the trace of the matrix, and
$$tr(I_n-H)=tr(I_n)-tr(H)=n-tr(X(X^TX)^{-1}X^T)=n-tr((X^TX)^{-1}X^TX)$$
$$=n-tr(I_{p+1})=n-p-1$$
Hence $I-H$ must have $n-p-1$ eigenvalues equal to $1$ and $p+1$ eigenvalues equal to $0$.
Now we can use the spectral decomposition of $I-H=ADA^T$ where $D=\begin{pmatrix}I_{n-p-1} & 0_{[n-p-1]\times[p+1]}\\0_{[p+1]\times [n-p-1]} & 0_{[p+1]\times [p+1]}\end{pmatrix}$ and $A$ is orthogonal (because $I-H$ is symmetric) . A further property which is useful is that $HX=X$. This helps narrow down the $A$ matrix
$$HX=X\implies(I-H)X=0\implies ADA^TX=0\implies DA^TX=0$$
$$\implies (A^TX)_{ij}=0\;\;\;i=1,\dots,n-p-1\;\;\; j=1,\dots,p+1$$
and we get:
$$\frac{\sum_{i=1}(Y-\hat{Y_i})^2}{\sigma^2}=\frac{Y^TADA^TY}{\sigma^2}=\frac{\sum_{i=1}^{n-p-1}(A^TY)_i^2}{\sigma^2}$$
Now, under the model we have $Y\sim N(X\beta,\sigma^2I)$ and using standard normal theory we have $A^TY\sim N(A^TX\beta,\sigma^2A^TA)\sim N(A^TX\beta,\sigma^2I)$ showing that the components of $A^TY$ are independent. Now using the useful result, we have that $(A^TY)_i\sim N(0,\sigma^2)$ for $i=1,\dots,n-p-1$. The chi-square distribution with $n-p-1$ degrees of freedom for the sum of squared errors follows immediately.
Isn't it divided by $\sqrt{(X^TX)^{-1}}$?
$\sqrt{(X^TX)^{-1}}$ is not a single number.
This is because $X$ is a $n$ by $p$ matrix and $X^T X$ will be a $p$ by $p$ matrix of which you need the $j$-th diagonal element.
But I am confused that why $$ \frac{N(0, (X^TX)^{-1})}{\sqrt{(X^TX)^{-1}_{jj}}}\sim N(0,I)?$$
This is not the case.
Yes, you do have that each individual $\hat\beta_j - \beta_j$ is normal distributed
But no, the multivariate case does not hold. The scaled $\hat\beta_j - \beta_j$ will be correlated.
In addition the expression $$\frac{N(0, (X^TX)^{-1})}{\sqrt{(X^TX)^{-1}_{jj}}}$$ does not seem quite right. The arithmetics/operation, how you divide the multivariate normal distribution, is not very clear and defined.
Best Answer
$$ Z = \frac{\widehat\beta - \beta}{\left( \dfrac \sigma { \sqrt{ n \left( \,\overline{(x^2)} - (\,\overline{x}\,\right)^2}} \right)} \sim \mathrm{N}(0,1). $$ And $$ (n-2) \frac{\widehat{\sigma}^2}{\sigma^2} \sim \chi^2_{n-2}. $$ Notice that $\sigma$ appears in both the numerator and the denominator of $Z/\sqrt{\chi^2_k/k}$ and cancels out.
Independence of these two things is seen by observing that the vector of residuals is independent of the vector of fitted values. To see that, find the covariance between the vector of residuals and the vector of fitted values, and recall that if two random vectors are jointly normally distributed then they are independent if they are uncorrelated.
(The whole story of why these things have the distributions asserted here would take somewhat longer.)