For multivariate normal distributions one can derive it in the same way and end up with:
$ (m+n)(\Sigma_{tot}+\underline{\mu}_{tot}\underline{\mu}_{tot}^T) = m(\Sigma_1+\underline{\mu}_{1}\underline{\mu}_{1}^T)+n(\Sigma_2+\underline{\mu}_{2}\underline{\mu}_{2}^T)$
Rearranging the symbols will give you the formula to calculate the total covariance.
If you have good reasons to believe that the variance of the two populations are equal, then it makes sense to use this information to improve the efficiency of your estimate.
In this case, your test statistic becomes:
$$t=\frac{(\bar{x}_x-\bar{x}_y)-(\mu_x-\mu_y)}{s\sqrt{\frac{1}{n_x}+\frac{1}{n_y}}}$$
So instead of having to estimate two variances, $\sigma_x^2$ and $\sigma_y^2$, you now have to estimate only one, $\sigma^2$.
In principle you could use any of the two sample variance estimates, but this would be ignoring part of the available information. Surely we can do better than that and combine the information from the two samples.
One way to combine the variances estimates of different samples in an unbiased way is to use the pooled variance estimate:
$$s_{pooled}^2 = \frac{(n_x-1)s_x^2 + (n_y-1)s_y^2}{n_x+n_y-2}$$
Where $s_x^2$ and $s_y^2$ are the unbiased sample variance estimates: $s_x^2 = \frac{1}{n_x-1}\sum_{i=1}^{n_x}(x_i-\bar{x}_x)^2$ (similarly for $s_y^2)$.
Edited after I understood the second part of your question:
In addition, do not confuse:
- The pooled variance $s^2_{pooled}$, as above, which is an estimate of $\sigma^2$
- The variance of the difference of two sample means, with sample size $n_x$ and $n_y$ and corresponding variance $\sigma_x^2$ and $\sigma_y^2$, which is: $var(\bar{x}_x-\bar{x}_y)=\frac{\sigma_x^2}{n_x}+\frac{\sigma_y^2}{n_y}$.
Note that the latter, which you find in square roots in the denominator of your t statistic, is the variance of the value of interest - the difference in the two means. It does not have to do with estimating a variance; rather, it has to do with standardizing your statistic.
Best Answer
There are really 2 questions here, one about pooling and one about degrees of freedom.
Let's look at degrees of freedom first. To get the concept consider if we know that $x+y+z=10$ Then $x$ can be anything we want, and $y$ can be anything we want, but once we set those 2 there is only one value that $z$ can be, so we have 2 degrees of freedom. When we calculate $S^2$ if we subtract the population mean from each $x_i$ then square and sum, then we would divide by $n$ taking the average squared difference. But we generally don't know the population mean so we subtract the sample mean as an estimate of the population mean. But subtracting the sample mean that is estimated from the same data as we are using to find $S^2$ guarentees the lowest possible sum of squares, so it will tend to be too small. But if we divide by $n-1$ instead then it is unbiased because we have taken into account that we already used the same data to compute one piece of information (the mean is just the sum divided by a constant). In regression models the degrees of freedom are equal to $n$ minus the number of parameters we estimate. Each time you estimate a parameter (mean, intercept, slope) you are spending 1 degree of freedom.
For the pooled variance function, $S^2_c$ and $S^2_t$ are already divided by $n_c-1$ and $n_t-1$, so the multiplying just gives the sums of squares, then we add the 2 sums of squares and divide by the total degrees of freedom (we subtract 2 because we estimated 2 sample means to get the sums of squares). The pooled variance is just a weighted average of the 2 variances.