R – How to Specify Mixed ANOVA with Multiple Repeated Measures and Covariate in R

anovamixed modelpredictorrrepeated measures

Working in R, how can I specify a mixed ANOVA with multiple between- and within-subjects factors in such a way that it's amenable to adding a covariate in a subsequent analysis? Also, ideally, I would like to use type 3 SS because that's what I'm used to.

I originally did my analysis (without the covariate) using ezANOVA, and found a predicted 3-way interaction of two between-subjects factors (pretrain and training, below) and one within-subjects factor (section, below). A possible explanation is that the two b-s factors affect another quantity (study, below), which in turn causes the above interaction. I am hoping this explanation is NOT correct, but in order to eliminate it, I'd like to add the other quantity ('study') to my model as a covariate and see whether the original 3-way interaction is still significant. Is this an acceptable approach?

Assuming yes, my issue is that I can't find a good way to add the covariate. (A) it can be added using "between_covariates" in ezANOVA, but there's a warning that this function is under development, and the results look strange, so I don't trust them without other verification. (B) Using aov, I couldn't exactly replicate my ezANOVA results without the covariate, and also couldn't figure out how to add a covariate. (C) Using lm, I couldn't figure out how to get the repeated measures working. (D) Using lme from nlme, I did get a working model with the repeated measures (shown below), but the results without the covariate were different from those of ezANOVA, so I don't think it's an equivalent model. I'd like to create a model equivalent to the original one, and then see whether my effect stays significant when the covariate is added. How can I do this?

Reproducible code:

# This data comes from an experiment designed to investigate effects of different types of instruction on learning. Instruction is manipulated between subjects using factorial combinations of "pretraining" (3 levels) and "training" (2 levels). Learning is assessed as change in performance from a pretest, administered before pretraining & training, to a posttest, administered after. Test accuracy is my dv and test section (pre vs post) is treated as a within-subjects factor. I also have an additional within-subjects factor called "problem type", which is used to classify different types of problems on the pretest and posttest. Some problem types have more problems than others, but the number of problems of a given type is the same for pretest and posttest.

# Sample data:
nsubj = 215; nsec  = 2; nprob = 6
D = data.frame(
    subjid   = rep( 1:nsubj, each=nsec*nprob ),
    pretrain = rep( sample( c('a','b','c'), nsubj, replace=TRUE ), each=nsec*nprob ),
    training = rep( sample( c('j','k'), nsubj, replace=TRUE ), each=nsec*nprob ),
    study    = rep( sample( 1:6, nsubj, replace=TRUE ), each=nsec*nprob ),
    section  = rep( rep( c( 'pretest', 'posttest' ), each=nprob ), nsubj ),
    probtype = rep( c( 'v', 'w', 'x', 'y', 'z', 'z' ), nsec*nsubj ),
    accuracy = sample( c( 0.0, 0.5, 1.0 ), nsubj * nsec * nprob, replace=TRUE ) )

# Model:
library( ez )
options( contrasts=c("contr.sum","contr.poly") )
ezANOVA( data=D, wid=.(subjid), dv=.(accuracy), within=.(section,probtype), between=.(pretrain,training), type=3 )

# nlme version:
library( nlme )
fit = lme( accuracy ~ section*probtype*pretrain*training, random=~1|subjid, method='REML', data=D )
anova.lme( fit, type='marginal' )
# results in similar but not identical F and p values to the original model. Denominator dfs are quite different.

Best Answer

First of all, you really need to make sure, that adding a covariate makes any sense. Adding a covariate only makes sense, if the covariate is not correlated with the independent variable (i.e., if there is no a priori reason to assume that being in either of the pretrain or trainign conditions affects study). To make sure to you really understand this I highly recommend reading the following brief paper:

Miller, G. A., & Chapman, J. P. (2001). Misunderstanding analysis of covariance. Journal of Abnormal Psychology, 110(1), 40–48. doi:10.1037/0021-843X.110.1.40

Then to run the ANOVA and compare it with an ANOCVA, you can use aov.car or ez.glm from afex (disclaimer: I am the author of afex), as in the following:

require(afex)
#ANOVA
ez.glm("subjid", "accuracy", D, within=c("section","probtype"), 
       between= c("pretrain","training"))

#ANCOVA
ez.glm("subjid", "accuracy", D, within=c("section","probtype"), 
       between= c("pretrain","training"), covariate= "study",
       factorize = FALSE)

A few more things to note:

Your covariate is not centered on 0, this makes afex throw a warning. As it is not in a between-subjects interaction, you should be able to safely ignore that.
You need to set factorize = FALSE when including a numerical covariate as the default behavior of ez.glm and aov.car is to automatically factorize all variables.
The results from ezANOVA with between_covariate should be very similar, however, the last time I checked, the calculation of the df was wrong (ezANOVA, does incorrectly not use one df per numerical covariate, rather covariates do not change the df).
Probably, ANOVA is NOT the correct tool for your analysis. You should definitely read the following excellent paper on why accuracy shouldn't be analyzed with ANOVA:
Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434–446. doi:10.1016/j.jml.2007.11.007

Related Solutions

Solved – Effect sizes from LMM

The dv, transfer, measures pretest to posttest improvement for a given level of distance

If this is a continuous variable then odds ratios don't make sense and you can just use the regression coefficient as an effect size. If it it binary, then you should fit a logistic model and report odds ratios.

Solved – Stepwise introduction of predictors to mixed-effects models

You can also perform a likelihood-ratio test (LRT) with mixed models. The command is also anova(model1, model2). But there are a few things you have to consider. To use the anova command after lme that has been fitted by REML (Restricted maximum likelihood; the default), the models have to include the same fixed effects and both have to be fitted via REML. If you want to calculate a likelihood-ratio test for models with different fixed effects, you have to fit the models via maximum likelihood (ML). You can do that by specifying the option method="ML" within the lme command. After that, you can just type anova(lme.mod1, lme.mod2) to calculate the LR test. If the LR test is significant, you have evidence that the model including probCategorySame is an improvement over the model without that variable.

As an alternative, you can also simulate the likelihood-ratio test with the following syntax (lm1 is the simpler model and lm2 is the alternative model):

sim.lme <- simulate(lm1, nsim=1000, m2=lm2, method="ML")
plot(sim.lme)

This produces a graphic like that

Simulation of p-value for an lme-model

The key is that the simulated empirical $p$-values should be roughly the same as the nominal $p$-values (the blue line should be diagonal). If that is the case, you can use the nomial $p$-value from the likelihood-ratio test.

In their book Mixed-Effects Models in S and S-Plus (page 87-92), Pinheiro and Bates discourage the use of likelihood-ratio test for assessing the significance of fixed effects. They recommend the conditional $F$-test instead, which you can use with the anova command with a single model. In your case, that would be anova(lm2).

Best Answer

Related Solutions

Solved – Effect sizes from LMM

Solved – Stepwise introduction of predictors to mixed-effects models

Related Question