We can prove this for more general case of $p$ variables by using the "hat matrix" and some of its useful properties. These results are usually much harder to state in non matrix terms because of the use of the spectral decomposition.
Now in matrix version of least squares, the hat matrix is $H=X(X^TX)^{-1}X^T$ where $X$ has $n$ rows and $p+1$ columns (column of ones for $\beta_0$). Assume full column rank for convenience - else you could replace $p+1$ by the column rank of $X$ in the following. We can write the fitted values as $\hat{Y}_i=\sum_{j=1}^nH_{ij}Y_j$ or in matrix notation $\hat{Y}=HY$. Using this, we can write the sum of squares as:
$$\frac{\sum_{i=1}(Y-\hat{Y_i})^2}{\sigma^2}=\frac{(Y-\hat{Y})^T(Y-\hat{Y})}{\sigma^2}=\frac{(Y-HY)^T(Y-HY)}{\sigma^2}$$
$$=\frac{Y^T(I_n-H)Y}{\sigma^2}$$
Where $I_n$ is an identity matrix of order $n$. The last step follows from the fact that $H$ is an idepotent matrix, as $$H^2=[X(X^TX)^{-1}X^T][X(X^TX)^{-1}X^T]=X(X^TX)^{-1}X^T=H=HH^T=H^TH$$
Now a neat property of idepotent matrices is that all of their eigenvalues must be equal to zero or one. Letting $e$ denote a normalised eigenvector of $H$ with eigenvalue $l$, we can prove this as follows:
$$He=le\implies H(He)=H(le)$$
$$LHS=H^2e=He=le\;\;\; RHS=lHe=l^2e$$
$$\implies le=l^2e\implies l=0\text{ or }1$$
(note that $e$ cannot be zero as it must satisfy $e^Te=1$) Now because $H$ is idepotent, $I_n-H$ also is, because
$$(I_n-H)(I_n-H)=I-IH-HI+H^2=I_n-H$$
We also have the property that the sum of the eigenvalues equals the trace of the matrix, and
$$tr(I_n-H)=tr(I_n)-tr(H)=n-tr(X(X^TX)^{-1}X^T)=n-tr((X^TX)^{-1}X^TX)$$
$$=n-tr(I_{p+1})=n-p-1$$
Hence $I-H$ must have $n-p-1$ eigenvalues equal to $1$ and $p+1$ eigenvalues equal to $0$.
Now we can use the spectral decomposition of $I-H=ADA^T$ where $D=\begin{pmatrix}I_{n-p-1} & 0_{[n-p-1]\times[p+1]}\\0_{[p+1]\times [n-p-1]} & 0_{[p+1]\times [p+1]}\end{pmatrix}$ and $A$ is orthogonal (because $I-H$ is symmetric) . A further property which is useful is that $HX=X$. This helps narrow down the $A$ matrix
$$HX=X\implies(I-H)X=0\implies ADA^TX=0\implies DA^TX=0$$
$$\implies (A^TX)_{ij}=0\;\;\;i=1,\dots,n-p-1\;\;\; j=1,\dots,p+1$$
and we get:
$$\frac{\sum_{i=1}(Y-\hat{Y_i})^2}{\sigma^2}=\frac{Y^TADA^TY}{\sigma^2}=\frac{\sum_{i=1}^{n-p-1}(A^TY)_i^2}{\sigma^2}$$
Now, under the model we have $Y\sim N(X\beta,\sigma^2I)$ and using standard normal theory we have $A^TY\sim N(A^TX\beta,\sigma^2A^TA)\sim N(A^TX\beta,\sigma^2I)$ showing that the components of $A^TY$ are independent. Now using the useful result, we have that $(A^TY)_i\sim N(0,\sigma^2)$ for $i=1,\dots,n-p-1$. The chi-square distribution with $n-p-1$ degrees of freedom for the sum of squared errors follows immediately.
As Glen_b noted in the comments, if the variances are all the same you end up with a scaled noncentral chi-squared.
If not, there is a concept of a generalized chi-squared distribution, i.e. $x^T A x$ for $x \sim N(\mu, \Sigma)$ and $A$ fixed. In this case, you have the special case of diagonal $\Sigma$ ($\Sigma_{ii} = \sigma_i^2$), and $A = I$.
There has been some work on computing things with this distribution:
You can also write it as a linear combination of independent noncentral chi-squared variables $Y = \sum_{i=1}^n \sigma_i^2 \left( \frac{X_i^2}{\sigma_i^2} \right)$, in which case:
Bausch (2013) gives a more computationally efficient algorithm for the linear combination of central chi-squareds; his work might be extensible to noncentral chi-squareds, and you might find some interesting pointers in the related work section.
Best Answer
$X_i \sim \mathcal{N}(0,\sigma^2_i) \Rightarrow \frac{X_i}{\sigma_i}\sim \mathcal{N}(0,1) $
$\therefore$ $\frac{X_i^2}{\sigma_i^2} \sim \chi^2(1)=\Gamma(1/2,2)$
$X_i^2 \sim \sigma_i^2\Gamma(1/2,2)=\Gamma(1/2,2\sigma_i^2)$
If your $\sigma_i$s are fixed (i.e all the same) then
$\sum_{i=1}^n X_i^2 \sim \sum_{i=1}^n\Gamma(1/2, 2\sigma^2)=\Gamma(n/2,2\sigma^2)$ suppose $\sigma_i$s are equal to $\sigma$
i.e $\sum_{i=1}^n X_i^2$ has a gamma distribution with $k=n/2,\theta=2\sigma^2$
If your $\sigma_i$s are not fixed then
ref this