We can prove this for more general case of $p$ variables by using the "hat matrix" and some of its useful properties. These results are usually much harder to state in non matrix terms because of the use of the spectral decomposition.
Now in matrix version of least squares, the hat matrix is $H=X(X^TX)^{-1}X^T$ where $X$ has $n$ rows and $p+1$ columns (column of ones for $\beta_0$). Assume full column rank for convenience - else you could replace $p+1$ by the column rank of $X$ in the following. We can write the fitted values as $\hat{Y}_i=\sum_{j=1}^nH_{ij}Y_j$ or in matrix notation $\hat{Y}=HY$. Using this, we can write the sum of squares as:
$$\frac{\sum_{i=1}(Y-\hat{Y_i})^2}{\sigma^2}=\frac{(Y-\hat{Y})^T(Y-\hat{Y})}{\sigma^2}=\frac{(Y-HY)^T(Y-HY)}{\sigma^2}$$
$$=\frac{Y^T(I_n-H)Y}{\sigma^2}$$
Where $I_n$ is an identity matrix of order $n$. The last step follows from the fact that $H$ is an idepotent matrix, as $$H^2=[X(X^TX)^{-1}X^T][X(X^TX)^{-1}X^T]=X(X^TX)^{-1}X^T=H=HH^T=H^TH$$
Now a neat property of idepotent matrices is that all of their eigenvalues must be equal to zero or one. Letting $e$ denote a normalised eigenvector of $H$ with eigenvalue $l$, we can prove this as follows:
$$He=le\implies H(He)=H(le)$$
$$LHS=H^2e=He=le\;\;\; RHS=lHe=l^2e$$
$$\implies le=l^2e\implies l=0\text{ or }1$$
(note that $e$ cannot be zero as it must satisfy $e^Te=1$) Now because $H$ is idepotent, $I_n-H$ also is, because
$$(I_n-H)(I_n-H)=I-IH-HI+H^2=I_n-H$$
We also have the property that the sum of the eigenvalues equals the trace of the matrix, and
$$tr(I_n-H)=tr(I_n)-tr(H)=n-tr(X(X^TX)^{-1}X^T)=n-tr((X^TX)^{-1}X^TX)$$
$$=n-tr(I_{p+1})=n-p-1$$
Hence $I-H$ must have $n-p-1$ eigenvalues equal to $1$ and $p+1$ eigenvalues equal to $0$.
Now we can use the spectral decomposition of $I-H=ADA^T$ where $D=\begin{pmatrix}I_{n-p-1} & 0_{[n-p-1]\times[p+1]}\\0_{[p+1]\times [n-p-1]} & 0_{[p+1]\times [p+1]}\end{pmatrix}$ and $A$ is orthogonal (because $I-H$ is symmetric) . A further property which is useful is that $HX=X$. This helps narrow down the $A$ matrix
$$HX=X\implies(I-H)X=0\implies ADA^TX=0\implies DA^TX=0$$
$$\implies (A^TX)_{ij}=0\;\;\;i=1,\dots,n-p-1\;\;\; j=1,\dots,p+1$$
and we get:
$$\frac{\sum_{i=1}(Y-\hat{Y_i})^2}{\sigma^2}=\frac{Y^TADA^TY}{\sigma^2}=\frac{\sum_{i=1}^{n-p-1}(A^TY)_i^2}{\sigma^2}$$
Now, under the model we have $Y\sim N(X\beta,\sigma^2I)$ and using standard normal theory we have $A^TY\sim N(A^TX\beta,\sigma^2A^TA)\sim N(A^TX\beta,\sigma^2I)$ showing that the components of $A^TY$ are independent. Now using the useful result, we have that $(A^TY)_i\sim N(0,\sigma^2)$ for $i=1,\dots,n-p-1$. The chi-square distribution with $n-p-1$ degrees of freedom for the sum of squared errors follows immediately.
Best Answer
If you consider the following questions they may help you towards an answer:
How can the distribution formula be rearranged to give the distribution of $S^2$ in terms of $\sigma^2$?
What are the mean and variance of the distribution of $S^2$?
Assuming $\sigma^2$ is unknown, as a preliminary to 4 below pick a value for it and find the implied upper and lower 5% (say) confidence limits. Is zero within these limits?
Are there values of $\sigma^2$ that would result in zero being within the confidence limits? If yes, what range of values? If no, what follows?
Finally (and the relevance of this will depend on the level of your course), how is all the above affected if the data is discrete? Very likely, however, the data is intended to be continuous and for convenience the given values are "discrete-looking".