Solved – Should I perform multiple t-test or an ANOVA

anovasample-sizet-test

My causal comparative study seeks to three questions related to the hypothesis: a) is there a statistically significant difference between the academic achievement of 6th grade mathematics students based on non-band music participation status; b) is there a statistically significant difference between the academic achievement of 6th grade males based on non-band music participation; and c) is there a statistically significant difference between the academic achievement of 6th grade females based on non-band music participation?

Am I correct in stating that I would perform three different t-test to compare means, or do I need to use an ANOVA?

Additionally, I have a sample size of approximately 765 students, is this large enough?

Best Answer

Since you have three planned questions, I think your 3 t tests are the best approach. I also think ANOVA is over-used. It just tests for the presence of some effect, somewhere, and should always be followed-up with a meaningful interpretation of the effects themselves.

As for sample size, I have two quick answers:

I don't know, due to insufficient information (e.g., target effect size, SD of the measurements, etc.)
My interpretation of your question is that the data are already collected. If so, it's too late to be asking about sample size.

Related Solutions

Solved – Problem with ANOVA repeated measures: “Error() model is singular”

Assuming your design is the following:

sex is a between-subjects IV (with two levels)
stimulus is a within-subjects IV (with 3 assumed levels)
condition is a within-subjects IV (with 2 levels)
all IVs are fully crossed

Then this is what you can do to run the full analysis, or to just test for a main effect of sex (generating some data first):

Nj        <- 10                               # number of subjects per sex
P         <- 2                                # number of levels for IV sex
Q         <- 3                                # number of levels for IV stimulus
R         <- 2                                # number of levels for IV condition
subject   <- factor(rep(1:(P*Nj), times=Q*R)) # subject id
sex       <- factor(rep(1:P, times=Q*R*Nj), labels=c("F", "M")) # IV sex
stimulus  <- factor(rep(1:Q, each=P*R*Nj))    # IV stimulus
condition <- factor(rep(rep(1:R, each=P*Nj), times=Q), labels=c("EXP1", "EXP2"))
DV_t11    <- round(rnorm(P*Nj,  8, 2), 2)     # responses for stimulus=1 and condition=1
DV_t21    <- round(rnorm(P*Nj, 13, 2), 2)     # responses for stimulus=2 and condition=1
DV_t31    <- round(rnorm(P*Nj, 13, 2), 2)
DV_t12    <- round(rnorm(P*Nj, 10, 2), 2)
DV_t22    <- round(rnorm(P*Nj, 15, 2), 2)
DV_t32    <- round(rnorm(P*Nj, 15, 2), 2)
response  <- c(DV_t11, DV_t12, DV_t21, DV_t22, DV_t31, DV_t32)       # all responses
dfL       <- data.frame(subject, sex, stimulus, condition, response) # long format

Now with the data set up, you can use aov(), but you won't get the $\hat{\epsilon}$ corrections for the within-effects.

> summary(aov(response ~ sex*stimulus*condition
+                        + Error(subject/(stimulus*condition)), data=dfL))
Error: subject
          Df Sum Sq Mean Sq F value Pr(>F)
sex        1  2.803  2.8030    0.51 0.4843   # ... snip ...

You can also use the Anova() function from the car package, which gives you the $\hat{\epsilon}$ corrections. However, it requires your data to be in wide format. You have to use multivariate notation for your model formula.

> sexW  <- factor(rep(1:P, Nj), labels=c("F", "M"))     # factor sex for wide format
> dfW   <- data.frame(sexW, DV_t11, DV_t21, DV_t31, DV_t12, DV_t22, DV_t32) # wide format
> # between-model in multivariate notation
> fit   <- lm(cbind(DV_t11, DV_t21, DV_t31, DV_t12, DV_t22, DV_t32) ~ sexW, data=dfW)
> # dataframe describing the columns of the data matrix
> intra <- expand.grid(stimulus=gl(Q, 1), condition=gl(R, 1))
> library(car)                    # for Anova()
> summary(Anova(fit, idata=intra, idesign=~stimulus*condition),
+         multivariate=FALSE, univariate=TRUE)
Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
                   SS num Df Error SS den Df         F    Pr(>F)    
(Intercept)   17934.1      1   98.930     18 3263.0403 < 2.2e-16 ***
sexW              2.8      1   98.930     18    0.5100 0.4843021  # ... snip ...

Using the ez package and the command suggested by @Mike Lawrence gives the same result:

> library(ez)              # for ezANOVA()
> ezANOVA(data=dfL, wid=.(subject), dv=.(response),
+         within=.(stimulus, condition), between=.(sex), observed=.(sex))
$ANOVA
     Effect DFn DFd          F            p p<.05         ges
2       sex   1  18  0.5099891 4.843021e-01       0.004660043      # ... snip ...

Finally, if the main effect for sex is really all you're interested in, it's equivalent to just average for each person across all the conditions created by the combinations of stimulus and condition, and then run a between-subjects ANOVA for the aggregated data.

# average per subject across all repeated measures
> mDf <- aggregate(response ~ subject + sex, data=dfL, FUN=mean)
> summary(aov(response ~ sex, data=mDf))     # ANOVA with just the between-effect
            Df  Sum Sq Mean Sq F value Pr(>F)
sex          1  0.4672 0.46716    0.51 0.4843
Residuals   18 16.4884 0.91602

Solved – Interpreting 2-way ANOVA results

The significance of the main effect for updates seems to support your view that there is a relationship between the number of updates and well-being. Additionally, you didn't find any evidence of an overall difference in happiness between men and women.

A p-value of .008 means that if people with different number of updates would report being equally happy on average, you would expect to observe a sample like yours or a more extreme one (i.e. one in which apparent differences are even stronger) 0.8% of the time. This is under the conventional threshold of 5%, so you would typically conclude that there is a difference. It's difficult to describe such results simply and it's easy to misinterpret p-value, so if you are not familiar with this you should probably try to read up on it.

Beyond that, there is also apparently an interaction effect, which is a little bit trickier to interpret. It could mean that the number of updates has a stronger association with happiness for men than for women (or the other way around), that the relationship only holds for one gender but not for the other or even that the relationship goes in opposite directions depending on gender (e.g. men updating their page frequently are happier than men who don't whereas women who update are less happy than women who don't). This result does suggest that it could make sense to retain the variable in the model but did you have a reason to believe gender has an effect in the first place?

One caveat is that the number of updates is presumably not under your control. If you learned statistics from books or courses oriented toward psychology, you will often find that they use causal language to describe significant effects but this is predicated upon the fact that the data come from a randomized experiment. You can run an ANOVA on variables like gender or updating frequency but what you have is in effect a correlation, not per se evidence that updating your Facebook page changes your level of happiness. Statistically, the technique is the same but observational data like yours and experimental data afford different conclusions.

A few other thoughts:

When there is a significant main effect for a factor with several levels, people often run post-hoc tests to find out more about where exactly the difference lies.
Number of updates apparently falls in four categories (0, 1 to 3, 4 to 6 updates, more than 6 updates?). If you have the original data, categorizing it is not generally recommended, it might be better to use the counts directly.
Plots are very useful to interpret your results. In your situation, you might want to look at the means in each cell of the design and at boxplots.
You need to think about the assumptions for the test and more generally about some potential pitfalls in interpreting differences in means.
It's important to think about the size of the effect, as you apparently did. If the decrease you observe seems very small (you wrote that it “barely decreases”), then that's the most important conclusion, even if this difference is significant. Standardized effect sizes like eta squared provide useful information but it's perfectly fine to also look at unstandardized effect sizes (i.e. the mean difference in happiness score).

Best Answer

Related Solutions

Solved – Problem with ANOVA repeated measures: “Error() model is singular”

Solved – Interpreting 2-way ANOVA results

Related Question