Solved – Apparent contradiction between t-test and 1-way ANOVA

anovat-test

I am confused about an apparent contradiction between t-test and 1-way ANOVA in one particular case – please suggest a way to think about it.

Suppose I want to compare some parameter between 3 groups, but I am mostly interested in comparison between group 1 and 2. The collected data looks like the graph below (dots: individual data points, lines: means +/- 95% CI). T-test of group 1 versus group 2 is highly significant (p<0.01). But the 1-way ANOVA for all 3 groups is non-significant (p=0.10). Intuitively, one can quite clearly see that groups 1 and 2 are different. But the addition of the group 3 with high variability obscures this fact.
I feel that statistics in this case obscures the common sense.

As an illustration consider this thought experiment. Imagine that I first collected only group 1 and 2, did the t-test, and concluded that these populations have different means. Then, I added group 3 (which is not even that important in the real experiment). Now, the formally correct test would be 1-way ANOVA, and the conclusion now is that "there is not enough evidence that populations have different means". But from the point of view of common sense, I don't understand how addition of the third group can change the previously established fact that populations 1 and 2 have different means.

Could you please suggest the way to reconcile the statistical and practical conclusions in this case?

Maybe there is some justification of using t-test instead of ANOVA in such cases?

Best Answer

Both Student's t-test and ANOVA work by evaluating the observed differences between means relative to the observed variation. In this case the ANOVA uses an average variation of all three groups, but the t-test uses only two groups. The two groups tested with the t-test have much lower variation than the third group and so the t-test yields a smaller p-value than the ANOVA.

Reconciling the statistical and practical conclusions is usually not something that can be accomplished using the dichotomous interpretation of significant/not significant. Instead, consider the p-values as continuous indices of the strength of evidence in the data about the null hypothesis and statistical model. If the p-value from the primary F-test of the ANOVA is larger than 0.05 then the p-value from the t-test is probably not very small. In that case you do not have very strong evidence against the null hypotheses in either case. Unless you have enough information from outside the experiment in hand to make a reasoned argument that backs up any conclusion that you want to make, you probably should defer any firm conclusion. It's rarely a mistake to run the experiment again!

Related Solutions

Solved – Statistical significance test: One way Anova and Kruskal-Wallis test

In general, you wouldn't necessarily expect one way ANOVA and the Kruskal-Wallis to be similar, sometimes they can give quite different p-values. See here for a little partial motivation for why you might expect a difference. [When samples are reasonably normal-looking and with means not too many standard errors apart, they often tend to give similar p-values. Outside that, they frequently don't.]

However, in this case the reason is more prosaic: Your Kruskal-Wallis p-value is wrong.

Here's a summary of results in R (details below).

                     p-value
Welch t-test:        0.001287
Equal-var. t-test:   8.552e-05
One way anova:       8.55e-05 
Wilcoxon test:       0.004847
Kruskal-Wallis:      0.003761

(Neither of the last two p-values are exact; if they were, you'd get the same p-value for the two-group comparison.)

Your problem is you're treating the second group's data as a factor (see the end of this answer).

Here's what I get in R with your data:

frh <- data.frame(group1 = c(103.56, 103.32, 103.32, 104.27, 103.56, 103.8),
                  group2 = c( 97.16,  97.16,  96.69,  98.58,  90.76,  97.64))

# strip chart:

enter image description here

# Welch t-test:
> with(frh,t.test(group1,group2))

    Welch Two Sample t-test

data:  group1 and group2
t = 6.3316, df = 5.163, p-value = 0.001287
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  4.368147 10.245186
sample estimates:
mean of x mean of y 
103.63833  96.33167

$\,$

# equal-variance t-test:
> with(frh,t.test(group1,group2,var.equal=TRUE))

    Two Sample t-test

data:  group1 and group2
t = 6.3316, df = 10, p-value = 8.552e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 4.735411 9.877922
sample estimates:
mean of x mean of y 
103.63833  96.33167

$\,$

#one way anova:
summary(aov(values~ind,stack(frh)))
            Df Sum Sq Mean Sq F value   Pr(>F)    
ind          1 160.16   160.2   40.09 8.55e-05 ***
Residuals   10  39.95     4.0

$\,$

# Wilcoxon-Mann-Whitney:
> with(frh,wilcox.test(group1,group2))

    Wilcoxon rank sum test with continuity correction

data:  group1 and group2
W = 36, p-value = 0.004847
alternative hypothesis: true location shift is not equal to 0

Warning message:
In wilcox.test.default(group1, group2) :
  cannot compute exact p-value with ties

$\,$

# Kruskal-Wallis test:
> kruskal.test(frh)

    Kruskal-Wallis rank sum test

data:  frh
Kruskal-Wallis chi-squared = 8.3958, df = 1, p-value = 0.003761

Those are all about as consistent with each other as I would expect on that data.

Now, here's how to get what you got for the Kruskal-Wallis:

with(frh,kruskal.test(group1,group2))

    Kruskal-Wallis rank sum test

data:  group1 and group2
Kruskal-Wallis chi-squared = 4.3939, df = 4, p-value = 0.3553

The problem is, if you're getting this, you're using it wrong. That's not how the function works - group2 is being treated as a factor defining different groups for data in group1.

So the main reason the Kruskal Wallis isn't giving you a roughly similar p-value to ANOVA is you didn't call it correctly.

Solved – ANOVA or t-test and multiple testing

The usual approach is to perform ANOVA and, if ANOVA shows significant differences, perform group comparisons making adjustments for multiple comparisons.

The adjustment you propose would be Bonferroni correction which is widely used (because it's simple) but rather conservative. You might be interested in some alternatives.

Best Answer

Related Solutions

Solved – Statistical significance test: One way Anova and Kruskal-Wallis test

Solved – ANOVA or t-test and multiple testing

Related Question