T-Test vs ANOVA vs Regression – What’s the Difference?

analysis-of-meansanovaregressiont-test

I know this question has been asked in similar ways already, but cannot find a suitable answer to understand it. I have three subsamples defined on programme participation (participants, drop-out, and comparison) and want to test for each of the groups separately whether the difference in means between the groups is significantly different from 0. So, overall I have three tests, mean1 = mean2, mean2 = mean3, mean1 = mean3

I read that using a paired t-test and a regression would result in the same, but that with ANOVA there is a slight difference? Does somebody know more about this and could suggest which one is best suited?

Thanks!

Best Answer

ANOVA vs $t$-tests

With ANOVA, you generally first perform an omnibus test. This is a test against the null-hypothesis that all group means are equal ($\mu_1=\mu_2=\mu_3$).

Only if there is sufficient evidence against this hypothesis, a post-hoc analysis can be run which is very similar to using 3 pairwise $t$-tests to check for individual differences. The most commonly used is called Tukey's Honest Significant Difference (or Tukey's HSD) and it has two important differences with a series of $t$-tests:

It uses the studentized range distribution instead of the $t$-distribution for $p$-values / confidence intervals;
It corrects for multiple testing by default.

The latter is the important part: Since you are testing three hypotheses, you have an inflated chance of at least one false positive. Multiple testing correction can also be applied to three $t$-tests, but with the ANOVA + Tukey's HSD, this is done by default.

A third difference with separate $t$-tests is that you use all your data, not group per group. This can be advantageous, as it allows for easier diagnostics of the residuals. However, it also means you may have to resort to alternatives to the standard ANOVA in case variances are not approximately equal among groups, or another assumption is violated.

ANOVA vs Linear Regression

ANOVA is a linear regression with only additions to the intercept, no 'slopes' in the colloquial sense of the word. However, when you use linear regression with dummy variables for each of your three categories, you will achieve identical results in terms of parameter estimates.

The difference is in the hypotheses you would usually test with a linear regression. Remember, in ANOVA, the tests are: omnibus, then pairwise comparisons. In linear regression you usually test whether:

$\beta_0 = 0$, testing whether the intercept is significantly non-zero;
$\beta_j = 0$, where $j$ is each of your variables.

In case you only have one variable (group), one of its categories will become the intercept (i.e., the reference group). In that case, the tests performed by most statistical software will be:

Is the estimate for the reference group significantly non-zero?
Is the estimate for $(\text{group 1}) - (\text{reference group})$ significantly non-zero?
Is the estimate for $(\text{group 2}) - (\text{reference group})$ significantly non-zero?

This is nice if you have a clear reference group, because you can then simply ignore the (usually meaningless) intercept $p$-value and only correct the other two for multiple testing. This saves you some power, because you only correct for two tests instead of three.

So to summarize, if the group you call comparison is actually a control group, you might want to use linear regression instead of ANOVA. However, the three tests you say you want to do in your question resemble that of an ANOVA post-hoc or three pairwise $t$-tests.

Related Solutions

Solved – Post hoc test in a 2×3 mixed design ANOVA using SPSS

Answer edited to implement encouraging and constructive comment by @Ferdi

I would like to:

provide an answer with a full contained script
mention one can also test more general custom contrasts using the /TEST command
argue this is necessary in some cases (ie the EMMEANS COMPARE combination is not enough)

I assume to have a database with columns: depV, Group, F1, F2. I implement a 2x2x2 mixed design ANOVA where depV is the dependent variable, F1 and F2 are within subject factors and Group is a between subject factor. I further assume the F test has revealed that the interaction Group*F2 is significant. I therefore need to use post hoc t-tests to understand what drives the interaction.

MIXED depV BY Group F1 F2 
  /FIXED=Group F1 F2 Group*F1 Group*F2 F1*F2 Group*F1*F2 |  SSTYPE(3) 
  /METHOD=REML 
  /RANDOM=INTERCEPT | SUBJECT(Subject) COVTYPE(VC) 
  /EMMEANS=TABLES(Group*F2) COMPARE(Group) ADJ(Bonferroni)
  /TEST(0) = 'depV(F2=1)-depV(F2=0) differs between groups' 
    Group*F2 1/4 -1/4 -1/4 1/4 
    Group*F1*F2 1/8 -1/8 1/8 -1/8 -1/8 1/8 -1/8 1/8 
  /TEST(0) = 'depV(Group1, F2=1)-depV(Group2, F2=1)' Group 1 -1
    Group*F1 1/2 1/2 -1/2 -1/2 
    Group*F2 1 0 -1 0  
    Group*F1*F2 1/2 0 1/2 0 -1/2 0 -1/2 0 .

In particular the second t-test corresponds to the one performed by the EMMEANS command. The EMMEANS comparison could reveal for example that depV was bigger in Group 1 on the condition F2=1.

However the interaction could also be driven by something else, which is verified by the first test: the difference depV(F2=1)-depV(F2=0) differs between groups, and this is a contrast you cannot verify with the EMMEANS command (at least I did not find an easy way).

Now, in models with many factors it is a bit tricky to write down the /TEST line, the sequence of 1/2, 1/4 etc, called L matrix. Typically if you get the error message: "the L matrix is not estimable", you are forgetting some elements. One link that explains the receipt is this one: https://stats.idre.ucla.edu/spss/faq/how-can-i-test-contrasts-and-interaction-contrasts-in-a-mixed-model/

Solved – Is nested ANOVA model appropriate for analysing student performance on a pre/post test

I figured this out after some searching and consultation with an old stats professor.

To employ a nested model ANOVA in SPSS, some editing of the syntax must be done, and the nesting effect is determined in the /DESIGN command. In the end, my syntax looked something like this (depending on your preference of parameters and posthoc tests):

UNIANOVA

  Y  BY A B

/METHOD = SSTYPE(2)
/INTERCEPT = INCLUDE
/POSTHOC = A B ( BONFERRONI )
/EMMEANS = TABLES(A) COMPARE ADJ(BONFERRONI)
/EMMEANS = TABLES(B) COMPARE ADJ(BONFERRONI)
/PRINT = DESCRIPTIVE ETASQ
/CRITERIA = ALPHA(.05)
/DESIGN = A B(A).