Solved – Can you accept the alternative hypothesis if the global test is significant but the pairwise comparisons are not

anovahypothesis testingtukey-hsd-test

My hypothesis states that "there would be significant differences among a, b, c and d group on the X measure." I used ANOVA to see the group differences which came out to be significant. After that I applied the Tukey test post-hoc to understand the comparisons further. However, I did not get significant group differences after Tukey. My supervisor says that the hypothesis is partially accepted. Can you explain how or if this statement is correct?

Best Answer

Your ANOVA was significant, implying you either made a Type I error or the means are not all equal (in which case the null is false).

Since the chance of making a Type I error was (presumably) set fairly low, the second option becomes a relatively plausible explanation for the size of the test statistic.

In that sense, the research hypothesis you stated is indicated.

However, your multiple comparisons were unable to clearly identify any specific 'cause' of that difference - likely there are several small effects that are enough for yout to conclude there's a difference, even though none alone are large enough to 'stand out' by themselves for you to say "this pair of groups differ on X".

(Such a thing happens not infrequently, especially when samples size calculations are based on only just achieving a moderate power at some overall effect size. If the effect sizes are all a little smaller than that, you may be unlikely to find them.)

Edit: To address the specific phrasing of the research hypothesis being 'partially accepted' -

It depends on what you mean by "correct".

I would not use such a phrase - either accepting the alternative or 'partial' in reference to it. You rejected the null, and there was nothing partial about that.

I think the important thing is to convey exactly what null was rejected.

I'd also draw clear displays of means and (ANOVA-based) standard errors of the mean (likely along with the raw data on the same display) in order that the effect sizes relative to the uncertainty was clear to the readership.

I certainly have never used such phrasing and don't imagine I ever will, but that doesn't make it objectively wrong. What matters most is that the audience of such a phrase clearly understand the intended meaning.

Related Solutions

Solved – How could a Tukey HSD test be more signif then the uncorrected P value of t.test

Because your pairwise $t$-test above is not adjusted for age, and age explains a lot of the variance in StressReduction.

Solved – Anova repeated measures is significant, but all the multiple comparisons with Bonferroni correction are not

Even without Bonferroni corrections ANOVA's do not guarantee any two means are different. For example, in a statistically decisive ANOVA result could come from two pairs of means are different from each other while no individual mean comparison is significant.

Consider why you run an ANOVA. You do it because if you did all of the comparisons with a categorical predictor value then you'd run into a multiple comparisons problem. But then you go and do many of the comparisons... why? The ANOVA means that the pattern of data you see is meaningful. Describe the pattern of data, both in a figure and text, and convey what your data mean. If you really wanted to run all of the multiple comparisons then running the ANOVA was pointless. Also, keep in mind that "all of the comparisons" does not mean just those comparisons between individual means but all of the patterns patterns and combinations you could test, the ANOVA is sensitive to them too.

In your particular case, what you would do is write something like the following. There was a main effect of group, with higher scores in the experimental group and a main effect of time with the first time the lowest score, followed by the last time and finally the highest score was at the intermediate time. However, each of these main effects was qualified by an interaction. The effect of time depends on which group you are in, being greater in the experimental than the control group.

That's what your ANOVA and summary statistics say. Unless there's something more than that you want to say there's no point in running comparisons.

ASIDE: While the following is important, I consider it an aside because the primary question here is interpreting your ANOVA. Your experimental group time 2 variance is so much higher than the others that you're violating assumptions of the ANOVA. You could run simulations to see how much that affects alpha or power in your case. I did a quick one and it shows alpha is generally about 0.06 (if you select 0.05) for each test, sample code below:

nsamp <- 2000
n <- 10
sds <- rep(c(1.36, 1.57, 1.48, 1.14, 3.52, 1.78), n)
x1 <- factor(rep(1:2, times = n, each = 3))
x2 <- factor(rep(1:3, 2*n))

Y <- replicate(nsamp, {
    y <- rnorm(6 * n, 0, sds)
    #y <- rnorm(6*n) # comment out the line above and comment in this one to see what would happen if variances were equal
    m <- aov(y ~ x1 * x2)
    sm <- summary(m)
    ps <- sm[[1]]$'Pr(>F)'
    ps
    #min(ps, na.rm = TRUE)
})

sum(Y[1,] < 0.05)/nsamp
sum(Y[2,] < 0.05)/nsamp
sum(Y[3,] < 0.05)/nsamp

Best Answer

Related Solutions

Solved – How could a Tukey HSD test be more signif then the uncorrected P value of t.test

Solved – Anova repeated measures is significant, but all the multiple comparisons with Bonferroni correction are not

Related Question