Solved – How are degrees of freedom used in the Welch’s t-test to determine p-value

p-valuestatistical significancet-test

I'm trying to understand using Welch's t-statistic. I'd like to compute the t-score between two distributions with different numbers of samples and different variances. The Null hypothesis is that both means are equal.

Looking at the Wikipedia entry, it gives the computation for the t-score and the degrees of freedom, but not much else.

I've seen tables (linked to from here) of t-score vs. degrees of freedom (for Welch's test), but it is unclear how the degrees of freedom is used. For the typical Student's t-score (equal variance and sample size), (presuming I have enough samples in the group), I simply integrate the area under the normal distribution for all values greater than $|t \cdot \sigma|$. I use this sum to get a p-value.

Question:

How is the degrees of freedom used to compute the statistical significance of the rejection of the Null hypothesis (i.e. p-value)?

Best Answer

The degrees of freedom $\nu$ in a Welch 2-sample t test depends on sample sizes $n_1$ and $n_2$ and sample variances $S_1^2$ and $S_2^2,$ as shown in your Wikipedia link.

The number $\nu$ of degrees of freedom for a Welch test satisfies $$\min(n_1 - 1, n_2 - 1) \le \nu \le n_1 + n_2 - 2.$$ Roughly speaking, $\nu$ is near its upper bound when the ratio $S_1^2/S_2^2$ is near $1$ and near its lower bound when this ratio is far from $1.$ [Note that $\nu = n_1 + n_2 - 2$ in the pooled two-sample t test where one assumes that $\sigma_1^2 = \sigma_2^2$ and hence the sample variances tend to be nearly equal.]

Once you have the value of $\nu,$ then the P-value associated with the Welch t statistic is found in the same way as it is in the pooled t test. The only slight exception might be that many computer implementations of the Welch test allow non-integer values of $\nu,$ which do not occur in the case of the pooled t test.

Here is an example (in R) using relatively small $n_1 = 10$ and $n_2 = 11.$ Population means differ, so that we might hope to reject $H_0: \mu_1 = \mu_2$ against $H_a: \mu_1 \ne \mu_2.$ Also, population variances differ, so that one should use the Welch test instead of the pooled test. However, the power is not large enough to reject $H_0$ so the difference in means goes undetected. Boxplots of the two samples are shown below, followed by output from R for the Welch test.

set.seed(2019)
x1 = rnorm(10, 100, 10);  x2 = rnorm(11, 90, 15)

enter image description here

t.test(x1, x2)

        Welch Two Sample t-test

data:  x1 and x2
t = 1.5964, df = 16.2, p-value = 0.1297
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.145136 22.406901
sample estimates:
mean of x mean of y 
 96.88794  87.25706 

Notice that $\min(9, 10) = 9 \le \nu = 16.2 \le 19,$ according to the inequality displayed above. The sample variances are $S_1^2 = 96.85,\, S_2^2 = 293.80,$ so it is not surprising that $\nu < n_1 + n_2 = 19.$

The P-value is the sum of the areas beneath the density curve of Student's t distribution with $\nu = 16.2$ to the left of $-1.5964$ and to the right of $1.5964$ (outside the vertical dotted lines in the figure below). A direct computation in R gives the the same P-value as in the printout above, where it is shown to four decimal places.

2 * pt(-1.5964, 16.2)
[1] 0.1297201

enter image description here

Notes: (1) For the population parameters and sample sizes used to generate the fake data in this example, the power of the Welch test is about 0.4, so it is not surprising we failed to reject. Power is simulated below:

set.seed(323)
p.val = replicate( 10^5, 
        t.test (rnorm(10,100,10), rnorm(11,90,15))$p.value  )
mean(p.val < .05)
[1] 0.39912

(2) An (inappropriate) pooled t test on these data yields $T=1.537,\; \nu = 19,$ and so a P-value about $0.14.$

2 * pt(-1.537, 19)
[1] 0.14078

Formulas for Welch and pooled t statistics differ, giving exactly the same numerical value if $n_1 = n_2.$ So in the 'balanced case', the essential difference between Welch and pooled tests is $\nu.$