From Rice - Mathematical Statistics and Data Analysis, using matrix notation.
Consider the sum of squared residuals for the general linear regression problem $||\mathbf{Y-HY}||^2$, where $\mathbf{H=X(X^TX)^{-1}X}$, then:
$$
\mathbb{E}||\mathbf{Y-HY}||^2 = \mathbb{E}(\mathbf{Y^T(I-H)Y}) = [\mathbb{E}(\mathbf{Y})]^T(\mathbf{I-H})[\mathbb{E}(Y)] + \sigma^2 tr(\mathbf{I-H}).
$$
As $\mathbb{E}(\mathbf{Y})=\mathbf{X\beta}$, we get $(\mathbf{I-H})[\mathbb{E}(Y)] = 0$. Furthermore, $tr(\mathbf{I-H})= tr(\mathbf{I}) - tr(\mathbf{H})= n-p$.
Thus,
$$\mathbb{E}||\mathbf{Y-HY}||^2 = (n-p) \sigma^2$$
For your two parameter model, we then have:
$$
\mathbb{E}(RSS) = (n-2)\sigma^2 \> \square
$$
There are many different ways to look at degrees of freedom. I wanted to provide a rigorous answer that starts from a concrete definition of degrees of freedom for a statistical estimator as this may be useful/satisfying to some readers:
Definition: Given an observational model of the form $$y_i=r(x_i)+\xi_i,\ \ \ i=1,\dots,n$$ where $\xi_i=\mathcal{N}(0,\sigma^2)$ are i.i.d. noise terms and the $x_i$ are fixed. The degrees of freedom (DOF) of the estimator $\hat{y}$ is defined as $$\text{df}(\hat{y})=\frac{1}{\sigma^2}\sum_{i=1}^n\text{Cov}(\hat{y}_i,y_i)=\frac{1}{\sigma^2}\text{Tr}(\text{Cov}(\hat{y},y)),$$ or equivalently by Stein's lemma $$\text{df}(\hat{y})=\mathbb{E}(\text{div} \hat{y}).$$
Using this definition, let's analyze linear regression.
Linear Regression: Consider the model $$y_i=x_i\beta +\xi_i,$$ with $x_i\in\mathbb{R}^p$ are independent row vectors. In your case, $p=2$, and the $x_i={z_i,1}$ correspond to a point and the constant $1$, and $\beta=\left[\begin{array}{c}
m\\
b
\end{array}\right]$, that is a slope and constant term so that $x_i \beta=m z_i+b$. Then this can be rewritten as $$y=X\beta+\xi$$ where $X$ is an $n\times p$ matrix whose $i^{th}$ row is $x_i$. The least squares estimator is $\hat{\beta}^{LS}=(X^T X)^{-1}X^Ty$. Let's now based on the above definition calculate the degrees of freedom of $SST$, $SSR$, and $SSE$.
$SST:$ For this, we need to calculate $$\text{df}(y_i-\overline{y})=\frac{1}{\sigma^2}\sum_{i=1}^n\text{Cov}(y_i-\overline{y},y_i)=n-\frac{1}{\sigma^2}\sum_{i=1}^n\text{Cov}(\overline{y},y_i)=n-\frac{1}{\sigma^2}\sum_{i=1}^n \frac{\sigma^2}{n}=n-1.$$
$SSR:$ For this, we need to calculate $$\text{df}(X\hat{\beta}^{LS}-\overline{y})=\frac{1}{\sigma^2}\text{Tr}\left(\text{Cov}(X(X^TX)^{-1}X^y,y\right)-\text{df}(\overline{y})$$ $$=-1+\text{Tr}(X(X^TX)^{-1}X\text{Cov(y,y)})$$ $$=-1+\text{Tr}(X(X^TX)^{-1}X^T)$$ $$=p-1.$$ In your case $p=2$ since you will want $X$ to include the all ones vector so that there is an intercept term, and so the degrees of freedom will be $1$. However note that this will equal the number of parameters when we are doing regression with multiple parameters.
$SSE:$ $(n-1)-(p-1)=n-p$, which follows linearity of $df$.
Best Answer
Note that for multiple regression model, where $k\ge 3$, the general formula for the coefficients is $$ \hat{\beta} = (X'X)^{-1}X'y, $$ such that only for the $y=\beta_0 +\beta_1x+\epsilon$ case you get for $\hat{\beta}_1=\frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2}$, otherwise $\beta_1$, and each one of the other coefficients will differ according to $k$. As such, intuitively, the model loses $1$ df for every estimated coefficient. So, in particular, for the $\hat{\sigma}^2 = \frac{1}{n-k}\sum(y_i - \hat{y}_i)^2 $ estimator, you are using $\hat{\mathrm{\beta}} = (\hat{\beta}_1, ..., \hat{\beta}_k )$, where $\mathrm{dim(\hat{\beta})}=k\times 1$, hence the total number of df for the $t$-test is $n-k$.