Solved – Why standard version of the Student t test assumes equal variances

heteroscedasticityt-testvariance

Several types of t-test exist: standard version of the Student t test, Welch t-test.

Standard version of the Student t test assumes equal variances for two populations, but the formula for t-test indeed takes account of each standard variances (S1, S2). If equal variances needed to be satisfied in t-test, then we don't write down each variance as S1 or S2, just one like S in the formula. I know I must have something missing for understanding this part.

According to comment @DavidLane in below, denominator uses both sample variances to estimate the population variance. This assumes the two sample variances are estimating the same population variance. It's called a pooled estimate. Therefore, why should we use a pooled estimate rather than an unpooled estimate of standard version of student t-test?

t-test formula

Best Answer

As already commented, the formula you present is actually not for the standard $t$-test with equal variances, but for the case when variances are not equal (and sample sizes may differ).

If you are just curious about the formula itself the crucial idea is: the variance of the difference of two random (independent) variables is the sum of the variances of both variables: $s_1^2+s_2^2$. This is why the two variances are added. That you divide by $n$ is because we are dealing with sampling distributions of means.

As far as I understand the logic is:

We have two populations with a mean $\mu_1, \mu_2$ and a variance $\sigma_1^2$, $\sigma_2^2$.

Now we draw many samples from both populations with a specific $n_1, n_2$. The mean values from these samples are calculated for both groups. By doing this with many samples we get two sampling distributions with means $\mu_1, \mu_2$ (same as in the population) but variances of $\sigma_{\bar{x}_1}^2 = \frac{\sigma_{1}^2}{n_1}$ and $\sigma_{\bar{x}_2}^2 = \frac{\sigma_{2}^2}{n_2}$

What we are actually interested in is the difference between these two sampling distributions. The mean value of these two distributions will be $\mu_\mathrm{diff} = \mu_1-\mu_2$ and now the crucial part is that the variance of this distribution $\sigma_\mathrm{diff}^2$ is $\sigma_{\bar{x}_1}^2+\sigma_{\bar{x}_2}^2$

We can estimate it by:

$\frac{\hat{\sigma}_{1}^2}{n_1}+\frac{\hat{\sigma}_{2}^2}{n_2}$

The standard error (as used in your formula) is just the square-root of this.

If variances in the population are equal, the procedure changes a bit (I assume equal sample sizes):

$\sigma_\mathrm{diff}^2$ is $\sigma_{\bar{x}_1}^2+\sigma_{\bar{x}_2}^2 = 2\sigma_{\bar{x}}^2 = 2\frac{\sigma^2}{n}$

The estimate for $\sigma^2$ is just the average of the estimated population variance from both samples (for equal sample sizes):

$\frac{\hat\sigma_{1}^2+\hat\sigma_{2}^2}{2}$

If the sample size varies between the two groups, you would take a weighted average and $\sigma_\mathrm{diff}$ has to be adjusted (see From where term $\left(\frac{1}{n}+\frac{1}{m}\right)$ came in estimated variance of $\bar x - \bar y$).

Note that in this formula also both variances are present, but just to get a good estimate of the actual population variance. If you draw many samples, your variance will differ. The best estimate for the population variance is just a weighted average of the unbiased sample variances. But of course only if we can assume that there is a single population variance (the variance is the same in both groups).

Now, one might wonder why one would make the assumption of equal variances at all, if a formula exists for the case when they are not equal. Very simple answer: if you assume that two groups are coming from the sample population, you need the proper sampling distribution for this case. So you construct a null hypothesis with "equal variances". There is only one population, so there is only one variance. Both samples are used to estimate this one variance. If you do not do this, you will get a wrong p-value.