Factorial Experiment – How to Deal With Repeated Measurements in the Same Condition of a Factorial Experiment

anovaexperiment-designlme4-nlmerepeated measures

I am in Psychology and trying to explore the utility of mixed modeling for analyzing my repeated-measures data in a factorial experiment. The primary reason for using mixed models is that I would like to avoid the common practice of averaging data collected in the same experimental condition. My understanding is that it's typically required for repeated-measures ANOVA that there is only one observation per condition per subject. What if you have several replications of the same condition for the same subject?

To be more concrete, I have two conditions a between-subjects factor A (2 levels) and a within-subjects factor B (3 levels). There are 4 repetitions of each level of B for a total of 12 randomized order trials per subject. Usually, I would simply average across these 4 to get an estimate for the performance of the subject in the condition and then run, but it seems that this way I'm throwing valuable information about variability. How to deal with such data using mixed modeling in R (I've been using lmer function). Maybe including trial number as another variable would work? I tried including trial # as a random factor together with the subject, but its estimated variance is very low compared to error and subject.

Best Answer

Multi-level models (aka mixed models etc) are designed to deal with the case where you have multiple measures on one person.

In the typical 2x2 design (not repeated measures) you have multiple observations in each cell, but these are on different and unrelated subjects (people or whatever), thus, they are independent, and ANOVA or regression (both are the general linear model) are fine (provided other assumptions are met).

If you have repeated measures on each subject, those data are not independent. There are various ways to deal with this. One way is to average the data for each person, but this isn't a very good way. Much better methods are multi-level models or general estimating equations (GEE).

Unfortunately, the terminology here can get very confusing. Better to write equations.

The general linear model (regular ANOVA or regression):

$Y = X\beta + \epsilon $

where Y is a vector of the dependent variable, X a matrix of independent variables, $\beta$ a vector of parameters to be estimated and $\epsilon$ is error. This assumes that

$\epsilon \sim \text{iid } \mathcal{N}(0, \sigma) $

Multi-level:

$Y = X\beta + Z\gamma + \epsilon$

where Z is the (known) design matrix and $\gamma$ is a vector of random effect parameters. Assumes $\gamma \sim \mathcal{N} (0, \sigma) $ and that the covariance between $\gamma$ and $\epsilon$ is 0.

Related Question