For a two-sample t test on samples from populations with
the same variance $\sigma^2,$ you have two proposed
variance estimates
$$ S_p^2 = \frac{(n_1 - 1)S^2_1+(n_2-1)S_2^2}{n_1+n_2-2},$$
and
$$ S_a^2 = \frac{(n_1S^2_1+n_2)S^2_2}{n_1+n_2}. $$
For $S_p^2,$ you have found $S_i^2; i=1,2,$ each of which requires computing a sample mean $\bar X_i, 1,2.$ So,
$$ \frac{\nu S_p^2}{\sigma^2} \sim
\mathsf{Chisq(\nu)}.$$ where $\nu = n_1+n_2 - 2.$
For $S_a^2,$ the distribution theory is not so clear.
You say something about $S_a^2$ being unbiased, but that
hardly specifies a distribution. Let's use The same
degrees of freedom $\nu$ as above for an experiment.
Simulation: Begin by looking at $m = 10\,000$ samples
x1
of size $n_1 = 2$ from
$\mathsf{Norm}(\mu_1 = 100, \sigma_1 = 15)$ and x2
of size $n_2=3$ from $\mathsf{Norm}(\mu_2 = 110, \sigma_2 = 15).$
We find the sample variances, the pooled variance estimat
and the average variance estimate. Then we look at the
corresponding chi-squared random variables.
set.seed(2022)
n1 = 2; m=10^5
M1 = matrix(rnorm(n1*m, 100, 15), nrow=m)
v1 = apply(M1, 1, var)
n2 = 3
M2 = matrix(rnorm(n2*m, 110, 15), nrow=m)
v2 = apply(M2, 1, var)
pool = (v1 + 2*v2)/(n1+n2-2)
q.p = (n1+n2-2)*pool/15^2
avg.v = (v1+v2)/(n1+n2) ####
q.a = (n1+n2)*avg.v/15^2
Then we compare the results with the density functions
of the corresponding chi-squared distribution.
For the pooled estimate $S_p^2$ we get a good match,
but for $S_a^2$ the fit is not good.
R code for graphs:
par(mfrow=c(1,2))
hist(q.p, prob=T, ylim=c(0,.35), col="skyblue2", main="Pooled")
curve(dchisq(x, n1+n2-2), add=T, lwd=2, col="orange")
hist(q.a, prob=T, ylim=c(0,.35), col="skyblue2", main="Averaged")
curve(dchisq(x, n1+n2-1), add=T, lwd=2, col="orange")
par(mfrow=c(1,1))
The simple answer is that the statistic does not depend at all on $\mu$, and this is much easier to see from the original, non-transformed formula:
$$t = \frac{\overline x - \overline y}{\sqrt{\frac{n_1 s_1^2 + n_2 s_2^2}\nu} \sqrt{\frac1{n_1} + \frac1{n_2}}}.$$
Indeed under the transformations $x_i \mapsto x_i - \mu$ and $y_i \mapsto y_i - \mu$, we have $\overline x \mapsto \overline x - \mu$ and $\overline y \mapsto \overline y - \mu$, and also $s_1^2 = \frac1n \sum (x_i - \overline x)^2 \mapsto s_1^2$ and similarly $s_2^2 \mapsto s_2^2$, so that the variable $t$ is invariant under horizontal shifts of the parent distribution. This is why we can assume $\mu = 0$ without loss of generality.
Best Answer
TLDR The difference between the two situations is whether you use $\sqrt{\frac{s_a^2}{n_a} + \frac{s_b^2}{n_b}}$ or $\sqrt{\frac{s^2}{n_a} + \frac{s^2}{n_b}}$.
In the second case the estimate of the variance of the populations $a$ and $b$ is coupled based on the assumption that the populations have equal variance.
The nasty looking formula in the second case stems from deriving the pooled sample deviation $s$ based on the individual sample deviations $s_a$ and $s_b$.
The formula's might become more intuitive when you consider the sum of the squared residuals from which the sample standard deviation is derived.
$$ \sum_{i=1}^n {r_i^2} = \sum_{i=1}^n (x_i - \bar{x})^2$$
This $r_i^2$ is a sum of $n$ terms but effectively equivalent to the sum of $n-1$ squared independent normal distributed variables with variance $\sigma$. (See for instance Why are the residuals in $\mathbb{R}^{n-p}$?)
The distribution of squared independent normal distributed variables with standard deviation $\sigma$ follows a gamma distribution* with shape parameter $n-1$ and scale parameter $\sigma^2$ and will have a mean of $(n-1)\sigma^2$. So if we divide by $n-1$ then we have an unbiased estimate of the variance.
$$s^2 = \hat{\sigma^2} = \frac{1}{n-1} \sum_{i=1}^n {r_i^2}$$
And $s$ is the corrected sample estimate of the standard deviation
$$s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n {r_i^2}}$$
Now, if we have more residual terms because we sampled two population where we assume that the two populations have the same variance (that is what the pooling does). Then we get simply a sum of those residual terms and now it is a gamma distributed variable that is equivalent to a sum of $(n_a - 1)+(n_b -1)$ squared normal distributed variables.
$$s = \sqrt{\frac{1}{(n_a-1)+(n_b-1)} \left(\sum_{i=1}^{n_a} {r_{a,i}^2} + \sum_{i=1}^{n_b} {r_{b,i}^2}\right)}$$
where $n_a$ and $n_b$ are the sizes of the two samples and $r_{a,i}$ and $r_{b,i}$ the residual terms in the two samples.
If instead of the sum of squared residuals you use the corrected sample standard deviations $$\sum_{i=1}^{n_a} {r_{a,i}^2} = s_a^2 (n_a -1)\\ \sum_{i=1}^{n_b} {r_{a,i}^2} = s_b^2 (n_b -1)$$
then you get
$$s = \sqrt{\frac{s_a^2 (n_a -1) +s_b^2 (n_b -1)}{(n_a-1)+(n_b-1)}}$$
The additional term $\sqrt{\frac{1}{n_a}+\frac{1}{n_b}}$ is to convert from an estimate about the variance/deviation of the population to the variance/deviation of the sample mean or the difference between two sample means. The estimate of the variance of the one mean will be $s^2/n_a$ and the other $s^2/n_b$. The estimate for the variance of the difference is the sum of those two.
*Some readers might be more familiar with the $\chi^2$ distribution which is a special case of the gamma distribution, but with a scale equal to 1.