Derivation: Degree of freedom of a t-distribution.

confidence intervalstatistics

Today I learned t-distributions and I had one big question.

Assuming that we are trying to find the CI of the difference between means $\mu_1-\mu_2$, we use the t-distribution if the sample size is small (we shall assume $n \le 29$).

We take $n_1$ and $n_2$ samples, respectively. If the sample standard deviations are $s_1$ and $s_2$ respectively, and further assuming that the true standard deviations are not equal, i.e., $\sigma_1 \ne \sigma_2$ then the degree of freedom is

$$df = \left\lfloor{ \frac{\left( \frac{s_1}{n_1} + \frac{s_2}{n_2} \right)^2}{\frac{\left(\frac{s_1}{n_1}\right)^2}{n_1-1}+\frac{\left(\frac{s_2}{n_2} \right)^2}{n_2-1}} }\right\rfloor$$

I was not able to follow my instructors argument at all… and I could not find good explanations online as well.

Is there anyone that could explain this?

Thank you.

Best Answer

To start, let's clear up two points of confusion:

(1) "[W]e use the t-distribution if the sample size is small."

Not exactly, if variances $\sigma_1^2,\, \sigma_2^2$ are unknown and estimated by $S_1^2,\, S_2^2,$ respectively, then you always use the t-distribution. (If sample sizes are large enough for degrees of freedom to exceed 30, then in some circumstances it is OK to use a normal approximation. But with modern software or printed t tables, the normal approximation is not necessary. The approximation works best for tests at the 5% level, not so well at 1%.)

(2) "[A]ssuming that the true standard deviations are not equal, ... then the degrees of freedom is given [by the Welch–Satterthwaite equation]."

No. This equation works whether or not $\sigma_1 = \sigma_2.$ However, if variances are not equal, you must use the Welch–Satterthwaite equation (not the pooled-variance equation with degrees of freedom $\nu = n_1 + n_2 - 2.)$


Pooled 2-sample t test: If data are normal and population variances are equal, then the test statistic for testing $H_0: \mu_1 = \mu_2$ against $H_a: \mu_1 \ne \mu_2$ is:

$$T = \frac{\bar X_1 - \bar X_2}{S_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}{}},$$ where $S_p^2 =\frac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{n_1 + n_2 - 2}.$ If $H_0$ is true, then $T$ has Student's t distribution with degrees of freedom $\nu = n_1 + n_2 - 2.$

Welch 'separate variances' 2-sample t test. However, more generally if $H_0$ is true, the test statistic

$$T^\prime = \frac{\bar X_1 + \bar X_2}{\sqrt{\frac{S_1^2}{n_1} +\frac{S_2^2}{n_2}}}.$$

is approximately distributed according to Student's t distribution with degrees of freedom $\nu$ given by the Welch-Satterthwaite equation. This is true whether or not the population variances are equal.

One can show that that degrees of freedom $\nu$ according to the Welch-Satterthwaite equation satisfies $$\min(n_1 - 1, n_2 - 1) \le \nu \le n_1 + n_2 - 2.$$ So if the smaller of the two sample sizes exceeds 30, then $\nu \ge 30$ and (testing at the 5% level) it is OK to use a normal approximation for the distribution of $T^\prime.$

Whatever the sample size, $T^\prime$ has very nearly Student's t distribution with the the Welch-Satterthwaite degrees of freedom. (This is known from probability theory and from many simulation studies.)

Which to use? The bottom line is that most statisticians use the $T^\prime$-statistic and the Welch-Satterthwaite degrees of freedom to do 2-sample t tests unless they have very strong prior evidence that population variances are equal (rarely the case). Most modern statistical software packages use the Welch 2-sample t test by default. Some programs will use $T$ with the pooled SD $S_p$ if the user overrides the default.

Notes: (a) If $n_1 = n_2,$ then one can show that $T = T^\prime$ numerically, but one should still use the Welch-Satterthwaite degrees of freedom unless the population variances are known to be equal.

(b) If sample variances $S_1^2$ and $S_2^2$ are nearly equal, then the Welch-Satterthwaite $\nu$ is near $n_1 + n_2 - 2.$ If the sample variances are far apart then $\nu$ may be considerably smaller---perhaps as small as $n_1 -1$ or $n_2 - 1.$

(c) Especially if $n_1 << n_2$ and $\sigma_2 << \sigma_1,$ then results from the pooled 2-sample test using $T$ and $S_p$ can be very misleading. (The notation $<<$ means 'much smaller than'.)

(d) It is not a good idea to test whether $\sigma_1^2 = \sigma_2^2$ in order to decide whether to use $T$ or $T^\prime.$ The test for equal variances has poor power, and simulation studies have shown that the 'hybrid' test (using $T^\prime$ only if the equal-variances test rejects) can give misleading results.

Demonstration of note (c). Using R statistical software:

Small sample from $\mathsf{Norm}(\mu_1=150,\sigma_1=30);$ larger sample from $\mathsf{Norm}(\mu_2=150,\sigma_2=5.)$ The null hypothesis is true, and so should not be rejected.

x1 = rnorm(10, 150, 30);  x2 = rnorm(50, 150, 5)

mean(x1);  sd(x1)
[1] 139.3158
[1] 31.34551
mean(x2);  sd(x2)
[1] 150.1088
[1] 5.246149

Welch 2-sample test properly fails to reject:

t.test(x1, x2)

        Welch Two Sample t-test

data:  x1 and x2
t = -1.0858, df = 9.1011, p-value = 0.3055
alternative hypothesis: true difference in means is not equal to 0
sample estimates:
mean of x mean of y 
 139.3158  150.1088 

Pooled two-sample t test improperly rejects at the 5% level, 'finding' a difference in population means that does not actually exist. (The small sample with the large SD gives a misleading sample mean.)

t.test(x1, x2, var.eq=T)

        Two Sample t-test

data:  x1 and x2
t = -2.3504, df = 58, p-value = 0.02217
alternative hypothesis: true difference in means is not equal to 0
sample estimates:
mean of x mean of y 
 139.3158  150.1088 
Related Question