Solved – Apparent contradiction between t-test and 1-way ANOVA

anovat-test

I am confused about an apparent contradiction between t-test and 1-way ANOVA in one particular case – please suggest a way to think about it.

Suppose I want to compare some parameter between 3 groups, but I am mostly interested in comparison between group 1 and 2. The collected data looks like the graph below (dots: individual data points, lines: means +/- 95% CI). T-test of group 1 versus group 2 is highly significant (p<0.01). But the 1-way ANOVA for all 3 groups is non-significant (p=0.10). Intuitively, one can quite clearly see that groups 1 and 2 are different. But the addition of the group 3 with high variability obscures this fact.
I feel that statistics in this case obscures the common sense.

As an illustration consider this thought experiment. Imagine that I first collected only group 1 and 2, did the t-test, and concluded that these populations have different means. Then, I added group 3 (which is not even that important in the real experiment). Now, the formally correct test would be 1-way ANOVA, and the conclusion now is that "there is not enough evidence that populations have different means". But from the point of view of common sense, I don't understand how addition of the third group can change the previously established fact that populations 1 and 2 have different means.

Could you please suggest the way to reconcile the statistical and practical conclusions in this case?

Maybe there is some justification of using t-test instead of ANOVA in such cases?

sample data

Best Answer

Both Student's t-test and ANOVA work by evaluating the observed differences between means relative to the observed variation. In this case the ANOVA uses an average variation of all three groups, but the t-test uses only two groups. The two groups tested with the t-test have much lower variation than the third group and so the t-test yields a smaller p-value than the ANOVA.

Reconciling the statistical and practical conclusions is usually not something that can be accomplished using the dichotomous interpretation of significant/not significant. Instead, consider the p-values as continuous indices of the strength of evidence in the data about the null hypothesis and statistical model. If the p-value from the primary F-test of the ANOVA is larger than 0.05 then the p-value from the t-test is probably not very small. In that case you do not have very strong evidence against the null hypotheses in either case. Unless you have enough information from outside the experiment in hand to make a reasoned argument that backs up any conclusion that you want to make, you probably should defer any firm conclusion. It's rarely a mistake to run the experiment again!

Related Question