R – How to Specify Mixed ANOVA with Multiple Repeated Measures and Covariate in R

anovamixed modelpredictorrrepeated measures

Working in R, how can I specify a mixed ANOVA with multiple between- and within-subjects factors in such a way that it's amenable to adding a covariate in a subsequent analysis? Also, ideally, I would like to use type 3 SS because that's what I'm used to.

I originally did my analysis (without the covariate) using ezANOVA, and found a predicted 3-way interaction of two between-subjects factors (pretrain and training, below) and one within-subjects factor (section, below). A possible explanation is that the two b-s factors affect another quantity (study, below), which in turn causes the above interaction. I am hoping this explanation is NOT correct, but in order to eliminate it, I'd like to add the other quantity ('study') to my model as a covariate and see whether the original 3-way interaction is still significant. Is this an acceptable approach?

Assuming yes, my issue is that I can't find a good way to add the covariate. (A) it can be added using "between_covariates" in ezANOVA, but there's a warning that this function is under development, and the results look strange, so I don't trust them without other verification. (B) Using aov, I couldn't exactly replicate my ezANOVA results without the covariate, and also couldn't figure out how to add a covariate. (C) Using lm, I couldn't figure out how to get the repeated measures working. (D) Using lme from nlme, I did get a working model with the repeated measures (shown below), but the results without the covariate were different from those of ezANOVA, so I don't think it's an equivalent model. I'd like to create a model equivalent to the original one, and then see whether my effect stays significant when the covariate is added. How can I do this?

Reproducible code:

# This data comes from an experiment designed to investigate effects of different types of instruction on learning. Instruction is manipulated between subjects using factorial combinations of "pretraining" (3 levels) and "training" (2 levels). Learning is assessed as change in performance from a pretest, administered before pretraining & training, to a posttest, administered after. Test accuracy is my dv and test section (pre vs post) is treated as a within-subjects factor. I also have an additional within-subjects factor called "problem type", which is used to classify different types of problems on the pretest and posttest. Some problem types have more problems than others, but the number of problems of a given type is the same for pretest and posttest.

# Sample data:
nsubj = 215; nsec  = 2; nprob = 6
D = data.frame(
    subjid   = rep( 1:nsubj, each=nsec*nprob ),
    pretrain = rep( sample( c('a','b','c'), nsubj, replace=TRUE ), each=nsec*nprob ),
    training = rep( sample( c('j','k'), nsubj, replace=TRUE ), each=nsec*nprob ),
    study    = rep( sample( 1:6, nsubj, replace=TRUE ), each=nsec*nprob ),
    section  = rep( rep( c( 'pretest', 'posttest' ), each=nprob ), nsubj ),
    probtype = rep( c( 'v', 'w', 'x', 'y', 'z', 'z' ), nsec*nsubj ),
    accuracy = sample( c( 0.0, 0.5, 1.0 ), nsubj * nsec * nprob, replace=TRUE ) )

# Model:
library( ez )
options( contrasts=c("contr.sum","contr.poly") )
ezANOVA( data=D, wid=.(subjid), dv=.(accuracy), within=.(section,probtype), between=.(pretrain,training), type=3 )

# nlme version:
library( nlme )
fit = lme( accuracy ~ section*probtype*pretrain*training, random=~1|subjid, method='REML', data=D )
anova.lme( fit, type='marginal' )
# results in similar but not identical F and p values to the original model. Denominator dfs are quite different.

Best Answer

First of all, you really need to make sure, that adding a covariate makes any sense. Adding a covariate only makes sense, if the covariate is not correlated with the independent variable (i.e., if there is no a priori reason to assume that being in either of the pretrain or trainign conditions affects study). To make sure to you really understand this I highly recommend reading the following brief paper:

Miller, G. A., & Chapman, J. P. (2001). Misunderstanding analysis of covariance. Journal of Abnormal Psychology, 110(1), 40–48. doi:10.1037/0021-843X.110.1.40

Then to run the ANOVA and compare it with an ANOCVA, you can use aov.car or ez.glm from afex (disclaimer: I am the author of afex), as in the following:

require(afex)
#ANOVA
ez.glm("subjid", "accuracy", D, within=c("section","probtype"), 
       between= c("pretrain","training"))

#ANCOVA
ez.glm("subjid", "accuracy", D, within=c("section","probtype"), 
       between= c("pretrain","training"), covariate= "study",
       factorize = FALSE)

A few more things to note:

  • Your covariate is not centered on 0, this makes afex throw a warning. As it is not in a between-subjects interaction, you should be able to safely ignore that.

  • You need to set factorize = FALSE when including a numerical covariate as the default behavior of ez.glm and aov.car is to automatically factorize all variables.

  • The results from ezANOVA with between_covariate should be very similar, however, the last time I checked, the calculation of the df was wrong (ezANOVA, does incorrectly not use one df per numerical covariate, rather covariates do not change the df).

  • Probably, ANOVA is NOT the correct tool for your analysis. You should definitely read the following excellent paper on why accuracy shouldn't be analyzed with ANOVA:
    Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434–446. doi:10.1016/j.jml.2007.11.007

Related Question