ANOVA – Equal Variance vs. Unequal Variance for Comparing Groups

anovagroup-differencesheteroscedasticityt-testvariance

I am a little bit confused about equal and unequal variances. I understand the definition and mathematics behind it, yet I don't know, for the purpose of my research, how to appropriately test that.

Example:

  1. I want to test the effects of Exercise on the psychological Well-Being of two separate Groups. Group 1 ist exercising regularly and Group 2 isn't.

  2. I want to test difference in Well-being in the Same Group before and immediately after Exercise.

I was thinking about doing a unpaired t-test for 1. But now I am reading, that I have to know, how the variances differ in each Group. My Question is, the Variance of what…for the dependent variable, which would be the well-being, is it the sample size, is it the amount of exercise they are doing? Do I have to give all participants the questionnare and then calculate and compare the variance of well-being
of the separate groups, in order to know what type of statistical test I can do? Isn't well-being the Variable I want to test. I am a litte bit confused about that and I hope someone can help me out.

Best Answer

Of course, you could wait until you have Well Being scores and then check to see if scores in the Exercise and No exercise groups have equal means. But that is not the best answer.

Instead, it is standard practice to avoid doing a test for equal variances and then branching to either a pooled 2-sample t test (which requires equal population variances) and a Welch 2-sample t test (which does not assume equal variances). One of several reasons for deprecating such a tandem-test procedure is that the variance test has poor power.

If you have no good reason to believe that the two groups have equal variances (perhaps from previous experience giving Well-Being scores to different groups), then you should automatically do the Welch two-sample t test to be safe.

Consider the following fictitious data, with $n_1 = 15$ No Exercise subjects $(\sigma = 30)$ and $n_2 = 30$ Exercise subjects $(\sigma = 20).$

set.seed(2022)
x1 = rnorm(15, 100, 30)
x2 = rnorm(30, 107, 20)

In R, an incorrect pooled 2-sample t test finds a significant difference between the two groups with P-value about $0.01,$ so you'd reject the null hypothesis that the two groups have the same Well-Being scores.

t.test(x1, x2, var.eq=T)

        Two Sample t-test

data:  x1 and x2
t = -2.7451, df = 43, p-value = 0.008793
alternative hypothesis: 
 true difference in means is not equal to 0
95 percent confidence interval:
 -33.231122  -5.083494
sample estimates:
 mean of x mean of y 
  88.48454 107.64185 

By contrast, the correct Welch two-sample t test finds a significant difference at the 4% level with P-value $0.03875 < 0.04 = 4\%.$

t.test(x1, x2)

        Welch Two Sample t-test

data:  x1 and x2
t = -2.233, df = 17.654, p-value = 0.03875
alternative hypothesis: 
 true difference in means is not equal to 0
95 percent confidence interval:
 -37.206482  -1.108135
sample estimates:
 mean of x mean of y 
  88.48454 107.64185 

You might be happier with the smaller P-value, but the difficulty is that the pooled test can be unreliable when population variances aren't equal.

Specifically, for the variances of my fictitious data, the pooled test at the intended 5% level actually rejects over 8% of the time, with a considerable risk of 'false discovery', as shown in the simulation below, where there is no difference in mean scores (both population means $100):$

set.seed(1234)
pv = replicate(10^4, t.test(rnorm(15, 100, 30),
       rnorm(30,100,20), var.eq=T)$p.val)
mean(pv <= .05)
[1] 0.0824
Related Question