1) No it isn't.
2) because the calculation of the distribution of the test statistic relies on using the square root of the ordinary Bessel-corrected variance to get the estimate of standard deviation.
If it were included it would only scale each t-statistic - and hence its distribution - by a factor (a different one at each d.f.); that would then scale the critical values by the same factor.
So, you could, if you like, construct a new set of "t"-tables with $s*=s/c_4$ used in the formula for a new statistic, $t*=\frac{\overline{X}-\mu_0}{s*/\sqrt{n}}=c_4(n)t_{n-1}$, then multiply all the tabulated values for $t_\nu$ by the corresponding $c_4(\nu+1)$ to get tables for the new statistic. But we could as readily base our tests on ML estimates of $\sigma$, which would be simpler in several ways, but also wouldn't change anything substantive about testing.
Making the estimate of population standard deviation unbiased would only make the calculation more complicated, and wouldn't save anything anywhere else (the same $\bar{x}$, $\overline{x^2}$ and $n$ would still ultimately lead to the same rejection or non-rejection). [To what end? Why not instead choose MLE or minimum MSE or any number of other ways of getting estimators of $\sigma$?]
There's nothing especially valuable about having an unbiased estimate of $s$ for this purpose (unbiasedness is a nice thing to have, other things being equal, but other things are rarely equal).
Given that people are used to using Bessel-corrected variances and hence the corresponding standard deviation, and the resulting null distributions are reasonably straightforward, there's little - if anything at all - to gain by using some other definition.
$$
Z = \frac{\widehat\beta - \beta}{\left( \dfrac \sigma { \sqrt{ n \left( \,\overline{(x^2)} - (\,\overline{x}\,\right)^2}} \right)} \sim \mathrm{N}(0,1).
$$
And
$$
(n-2) \frac{\widehat{\sigma}^2}{\sigma^2} \sim \chi^2_{n-2}.
$$
Notice that $\sigma$ appears in both the numerator and the denominator of $Z/\sqrt{\chi^2_k/k}$ and cancels out.
Independence of these two things is seen by observing that the vector of residuals is independent of the vector of fitted values. To see that, find the covariance between the vector of residuals and the vector of fitted values, and recall that if two random vectors are jointly normally distributed then they are independent if they are uncorrelated.
(The whole story of why these things have the distributions asserted here would take somewhat longer.)
Best Answer
Generally, when you do a statistical test, you want the null distribution of the statistic to have a stable form, so as to allow easy computation of P-values.
This is partly a historical consideration. In principle, there's no reason you couldn't use the unscaled statistic $\hat{\mu}/\hat{\sigma}$, as opposed to $\hat{\mu}/(\frac{\hat{\sigma}}{\sqrt{n}})$, and compare to the percentage points of a distribution that became successively narrower with sample size. But this would be very cumbersome if you didn't have modern computers to calculate those percentage points for you. Even today, when we do have computers that could do this calculation, having a relatively stable null distribution makes it easier to compare results from different studies involving different sample sizes. A t-statistic of 2.5 is easier to comprehend because it means roughly the same thing whether you have a sample of 100 or 100,000. You can't say the same about the unscaled z-statistic.