This is a brute force solution requiring just multivariable calculus.
It suffices to prove that the sample mean
$$
\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i
$$
and the sample variance
$$
S^2 = \frac{1}{n - 1} \sum_{i=1}^n \left(X_i - \bar{X}\right)^2
$$
are independent. Thus, it suffices to prove that the sample mean $\bar{X}$ is independent of the vector
$$
(X_1 - \bar{X}, \ldots, X_n - \bar{X}).
$$
Moreover, since
$$
\begin{aligned}
\sum_{i=1}^n (X_i - \bar{X})
&= \sum_{i=1}^n X_i - \sum_{i=1}^n \bar{X} \\
&= n \bar{X} - n \bar{X} \\
&= 0,
\end{aligned}
$$
and hence
$$
X_1 - \bar{X}
= -\sum_{i=2}^n (X_2 - \bar{X}),
$$
it follows that $X_1 - \bar{X}$ can be recovered from just knowing $(X_2 - \bar{X}, \ldots, X_n - \bar{X})$.
Thus, it suffices to prove that the sample mean $\bar{X}$ is independent from
$$
(X_2 - \bar{X}, \ldots, X_n - \bar{X}).
$$
Now consider the joint density
$$
\begin{aligned}
f_{(X_1, \ldots, X_n)}(x_1, \ldots, x_n)
&= \left(2 \pi \sigma^2\right)^{-n/2} \exp\left(-\sum_{i=1}^n \frac{1}{2}\left(\frac{x_i - \mu}{\sigma}\right)^2\right) \\
&= \left(2 \pi \sigma^2\right)^{-n/2} \exp\left(-\sum_{i=1}^n \frac{1}{2}\left(\frac{x_i - \bar{x}}{\sigma}\right)^2 - \frac{n}{2}\left(\frac{\bar{x} - \mu}{\sigma}\right)^2\right) \\
&= \underbrace{\left(2 \pi \sigma^2\right)^{-n/2}}_{\text{constant}}
\underbrace{\exp\left(-\sum_{i=1}^n \frac{1}{2}\left(\frac{x_i - \bar{x}}{\sigma}\right)^2\right)}_{\text{depends only on $(x_2-\bar{x},\ldots,x_n-\bar{x})$}}
\underbrace{\exp\left(-\frac{n}{2}\left(\frac{\bar{x} - \mu}{\sigma}\right)^2\right)}_{\text{depends only on $\bar{x}$}}.
\end{aligned}
$$
To get from $(X_1,\ldots,X_n)$ to $(\bar{X}, X_2 - \bar{X}, \ldots, X_n - \bar{X})$, consider the diffeomorphism $T : \mathbb{R}^n \to \mathbb{R}^n$ given by
$$
T(x_1, \ldots, x_n)
= (\bar{x}, x_2 - \bar{x}, \ldots, x_n - \bar{x}).
$$
($T$ is a diffeomorphism since it's clearly differentiable and its inverse is given by
$$
T^{-1}(y_1, \ldots, y_n)
= \left(n y_1 - \sum_{i=2}^n y_i, y_2 + y_1, \ldots, y_n + y_1\right),
$$
which is also clearly differentiable).
Up to transpose, the Jacobian matrix of $T$ is
$$
DT(x_1, \ldots, x_n)
= \begin{bmatrix}
1/n & 1/n & 1/n & \cdots & 1/n \\
-1/n & (n - 1) / n & -1/n & \cdots & -1/n \\
-1/n & -1/n & (n - 1) / n & \cdots & -1/n \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
-1/n & -1/n & -1/n & \cdots & (n - 1)/n.
\end{bmatrix},
$$
which doesn't depend on $x_1, \ldots, x_n$.
Thus, the determinant of $DT$ is some constant $C$.
Now the joint density of $(\bar{X}, X_2 - \bar{X}, \ldots, X_n - \bar{X})$ satisfies
$$
f_{(\bar{X}, X_2 - \bar{X}, \ldots, X_n - \bar{X})}(y_1, \ldots, y_n)
= |C| f_{(X_1, \ldots, X_n)}(T^{-1}(y_1, \ldots, y_n))
$$
which factors as a function of $y_1$ times a function of $(y_2, \ldots, y_n)$ by what was shown above.
Therefore, $\bar{X}$ and $(X_2 - \bar{X}, \ldots, X_n - \bar{X})$ are independent.
The beginning of your question is a bit confusing, but the issue on why the degrees of freedom can be directly addressed. The proof is already in the question you point to, so I'll try a quick intuiton.
You see, $\frac{\sum_{i=1}(Y-\hat{Y_i})^2}{\sigma^2}$ is a function of $\hat{Y}_i$, a value that's obtained from a model with $p+1$ parameters (you have a constant plus a $p$ $x_i$ variables in $X$).
From a statistical intuition point of view, it's natural to expect you'd subtract these number of parameters from the total number of observations $n$, hence giving you $n-(p+1)$ degress of freedom.
Best Answer
It is well-known that this is because the sample mean is used in place of $\mu$. Note that this is the sum of n chi-square random variables but they are dependent due to the use of the sample mean in place of $\mu$.