Factorial Experiment – How to Deal With Repeated Measurements in the Same Condition of a Factorial Experiment

anovaexperiment-designlme4-nlmerepeated measures

I am in Psychology and trying to explore the utility of mixed modeling for analyzing my repeated-measures data in a factorial experiment. The primary reason for using mixed models is that I would like to avoid the common practice of averaging data collected in the same experimental condition. My understanding is that it's typically required for repeated-measures ANOVA that there is only one observation per condition per subject. What if you have several replications of the same condition for the same subject?

To be more concrete, I have two conditions a between-subjects factor A (2 levels) and a within-subjects factor B (3 levels). There are 4 repetitions of each level of B for a total of 12 randomized order trials per subject. Usually, I would simply average across these 4 to get an estimate for the performance of the subject in the condition and then run, but it seems that this way I'm throwing valuable information about variability. How to deal with such data using mixed modeling in R (I've been using lmer function). Maybe including trial number as another variable would work? I tried including trial # as a random factor together with the subject, but its estimated variance is very low compared to error and subject.

Best Answer

Multi-level models (aka mixed models etc) are designed to deal with the case where you have multiple measures on one person.

In the typical 2x2 design (not repeated measures) you have multiple observations in each cell, but these are on different and unrelated subjects (people or whatever), thus, they are independent, and ANOVA or regression (both are the general linear model) are fine (provided other assumptions are met).

If you have repeated measures on each subject, those data are not independent. There are various ways to deal with this. One way is to average the data for each person, but this isn't a very good way. Much better methods are multi-level models or general estimating equations (GEE).

Unfortunately, the terminology here can get very confusing. Better to write equations.

The general linear model (regular ANOVA or regression):

$Y = X\beta + \epsilon $

where Y is a vector of the dependent variable, X a matrix of independent variables, $\beta$ a vector of parameters to be estimated and $\epsilon$ is error. This assumes that

$\epsilon \sim \text{iid } \mathcal{N}(0, \sigma) $

Multi-level:

$Y = X\beta + Z\gamma + \epsilon$

where Z is the (known) design matrix and $\gamma$ is a vector of random effect parameters. Assumes $\gamma \sim \mathcal{N} (0, \sigma) $ and that the covariance between $\gamma$ and $\epsilon$ is 0.

Related Solutions

Solved – How to deal with unbalanced group sizes in mixed design analysis

If you use type 3 for ANOVAs it is critical in R that you set the contrast to effect coding (i.e., "contr.sum").

The default contrast in R is dummy coding (or in R parlance, treatment coding) in which 0 represents the first factor level. This doesn't make too much sense when having interactions as explaind on the page I linked to.

To set effect coding, run the following:

options(contrasts=c("contr.sum","contr.poly"))

Alternatively, you can use the afex package, which has similar goals as ez, with the difference that it automatically sets the contrasts to effects coding and uses type 3 as default.

Solved – Does it make sense for a fixed effect to be nested within a random one, or how to code repeated measures in R (aov and lmer)

In mixed models the treatment of factors as either fixed or random, particularly in conjunction with whether they are crossed, partially crossed or nested can lead to a lot of confusion. Also, there appears to be differences in terminology between what is meant by nesting in the anova/designed experiments world and mixed/multilevel models world.

I don't profess to know all the answers, and my answer won't be complete (and may produce further questions) but I will try to address some of the issues here:

Does it make sense for a fixed effect to be nested within a random one, or how to code repeated measures in R (aov and lmer)?

(the question title)

No, I don't believe this makes sense. When we are dealing with repeated measures, then usually whatever the thing is that the measures are repeated on will be random, let's just call it Subject, and in lme4 we will want to include Subject on the right side of one or more | in the random part of the formula. If we have other random effects, then these are either crossed, partially crossed or nested - and my answer to this question addresses that.

The issue with these anova-type designed experiments seems to be how to deal with factors that would normally be thought of as fixed, in a repeated measures situation, and the questions in the body of the OP speak to this:

Why Error(subject/A) and not Error(subject)?

I do not usually use aov() so I could be missing something but, for me the Error(subject/A) is very misleading in the case of the linked question. Error(subject) in fact leads to exactly the same results.

Is it (1|subject) or (1|subject)+(1|A:subject) or simply (1|A:subject)?

This relates to this question. In this case, all the following random effects formulations lead to exactly the same result:

(1|subject)
(1|A:subject)
(1|subject) + (1|A:subject)
(1|subject) + (1|A:subject) + (1|B:subject)

However, this is because the simulated dataset in the question has no variation within anything, it is just created with Y = rnorm(48). If we take a real dataset such as the cake dataset in the lme4, we find that this will not generally be the case. From the documentation, here is the experimental setup:

Data on the breakage angle of chocolate cakes made with three different recipes and baked at six different temperatures. This is a split-plot design with the recipes being whole-units and the different temperatures being applied to sub-units (within replicates). The experimental notes suggest that the replicate numbering represents temporal ordering.

A data frame with 270 observations on the following 5 variables.

replicate a factor with levels 1 to 15

recipe a factor with levels A, B and C

temperature an ordered factor with levels 175 < 185 < 195 < 205 < 215 < 225

temp numeric value of the baking temperature (degrees F).

angle a numeric vector giving the angle at which the cake broke.

So, we have repeated measures within replicate, and we are also interested in the fixed factors recipe and temperature (we can ignore temp since this is just a different coding of temperature), and we can visualise the situation using xtabs:

> xtabs(~recipe+replicate,data=cake)

     replicate
recipe 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
     A 6 6 6 6 6 6 6 6 6  6  6  6  6  6  6
     B 6 6 6 6 6 6 6 6 6  6  6  6  6  6  6
     C 6 6 6 6 6 6 6 6 6  6  6  6  6  6  6

If recipe were a random effect we would say that these are crossed random effects. In no way does recipe A belong to replicate 1 or any other replicate.

> xtabs(~temp+replicate,data=cake)

     replicate
temp  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  175 3 3 3 3 3 3 3 3 3  3  3  3  3  3  3
  185 3 3 3 3 3 3 3 3 3  3  3  3  3  3  3
  195 3 3 3 3 3 3 3 3 3  3  3  3  3  3  3
  205 3 3 3 3 3 3 3 3 3  3  3  3  3  3  3
  215 3 3 3 3 3 3 3 3 3  3  3  3  3  3  3
  225 3 3 3 3 3 3 3 3 3  3  3  3  3  3  3

Similarly for temp.

So the first model we might fit is:

> lmm1 <-  lmer(angle ~ recipe * temperature + (1|replicate), cake, REML= FALSE)

This will treat each replicate as the only source of random variation (other than the residual of course). But there could be random differences between recipes. So we might be tempted to include recipe as another (crossed) random effect but that would be ill-advised because we have only 3 levels of recipe so we can't expect the model to estimate the variance components well. So instead we can use replicate:recipe as the grouping variable which will enable us to treat each combination of replicate and recipe as a separate grouping factor. So whereas with the above model we would have 15 random intercepts for the levels of replicate we will now have 45 random intercepts for each of the separate combinations:

lmm3 <-  lmer(angle ~ recipe * temperature + (1|replicate:recipe) , cake, REML= FALSE)

Note that we now have (very slightly) different results indicating that there is some random variability due to recipe, but not a great deal.

We could likewise do the same thing with temperature.

Now, going back to your question, you also ask

Why (1|subject) + (1|A:subject) and not (1|subject) + (0+A|subject) or even simply (A|subject)?

I'm not entire sure where this (using random slopes) comes from - it doesn't seem to arise in the 2 linked questions - but my problem with (1|subject) + (1|A:subject) is that this is exactly the same as (1|subject/A) which means that A is nested within subject, which in turn means (to me) that each level of A occurs in 1 and only 1 level of subject which clearly is not the case here.

I will probably add to and/or edit this answer after I've thought about it some more, but I wanted to get my initial thoughts down.

Best Answer

Related Solutions

Solved – How to deal with unbalanced group sizes in mixed design analysis

Solved – Does it make sense for a fixed effect to be nested within a random one, or how to code repeated measures in R (aov and lmer)

Related Question