If I understand your question correctly, you are wondering why you got different p-values from your t-tests when they are carried out as post-hoc tests or as separate tests. But did you control the FWER in the second case (because this is what id done with the step down Sidak-Holm method)? Because, in case of simple t-tests, the t-values won't change, unless you use a different pooling method for computing variance at the denominator, but the p-value of the unprotected tests will be lower than the corrected one.
This is easily seen with Bonferroni adjustment, since we multiply the observed p-value by the number of tests. With step-down methods like Holm-Sidak, the idea is rather to sort the null hypothesis tests by increasing p-values and correct the alpha value with Sidak correction factor in a stepwise manner ($\alpha’ = 1 - (1 - \alpha)^k$, with $k$ the number of possible comparisons, updated after each step). Note that, in contrast to Bonferroni-Holm's method, control of the FWER is only guaranteed when comparisons are independent. A more detailed description of the different kind of correction for multiple comparisons is available here: Pairwise Comparisons in SAS and SPSS.
Even without Bonferroni corrections ANOVA's do not guarantee any two means are different. For example, in a statistically decisive ANOVA result could come from two pairs of means are different from each other while no individual mean comparison is significant.
Consider why you run an ANOVA. You do it because if you did all of the comparisons with a categorical predictor value then you'd run into a multiple comparisons problem. But then you go and do many of the comparisons... why? The ANOVA means that the pattern of data you see is meaningful. Describe the pattern of data, both in a figure and text, and convey what your data mean. If you really wanted to run all of the multiple comparisons then running the ANOVA was pointless. Also, keep in mind that "all of the comparisons" does not mean just those comparisons between individual means but all of the patterns patterns and combinations you could test, the ANOVA is sensitive to them too.
In your particular case, what you would do is write something like the following. There was a main effect of group, with higher scores in the experimental group and a main effect of time with the first time the lowest score, followed by the last time and finally the highest score was at the intermediate time. However, each of these main effects was qualified by an interaction. The effect of time depends on which group you are in, being greater in the experimental than the control group.
That's what your ANOVA and summary statistics say. Unless there's something more than that you want to say there's no point in running comparisons.
ASIDE: While the following is important, I consider it an aside because the primary question here is interpreting your ANOVA. Your experimental group time 2 variance is so much higher than the others that you're violating assumptions of the ANOVA. You could run simulations to see how much that affects alpha or power in your case. I did a quick one and it shows alpha is generally about 0.06 (if you select 0.05) for each test, sample code below:
nsamp <- 2000
n <- 10
sds <- rep(c(1.36, 1.57, 1.48, 1.14, 3.52, 1.78), n)
x1 <- factor(rep(1:2, times = n, each = 3))
x2 <- factor(rep(1:3, 2*n))
Y <- replicate(nsamp, {
y <- rnorm(6 * n, 0, sds)
#y <- rnorm(6*n) # comment out the line above and comment in this one to see what would happen if variances were equal
m <- aov(y ~ x1 * x2)
sm <- summary(m)
ps <- sm[[1]]$'Pr(>F)'
ps
#min(ps, na.rm = TRUE)
})
sum(Y[1,] < 0.05)/nsamp
sum(Y[2,] < 0.05)/nsamp
sum(Y[3,] < 0.05)/nsamp
Best Answer
The reason that the ANOVA rejects could be due to violations in other assumptions such as homogeneity of variances/sphericity. (See assumption #5 here:https://statistics.laerd.com/spss-tutorials/one-way-anova-repeated-measures-using-spss-statistics.php)
If for example sphericity is violated, then you can conclude that the significant result from ANOVA is likely due to violations of the ANOVA assumptions as opposed to a significant difference between two levels.
If the important ANOVA assumptions are met and you still run into this issue, then you can conclude that there is a possible difference between at least two of the levels, but that the significant ANOVA may be a false positive.
This page explains multiple comparisons: http://www.biostathandbook.com/multiplecomparisons.html
Another possible conclusion/explanation for your particular case using Bonferroni adjustments is that the Bonferroni method is very conservative so even if a difference between the levels exist, multiple comparisons with Bonferroni adjustments may not have been able to detect it.