R Mixed Model – Specifying an LME Model with More Than One Within-Subjects Factor

lme4-nlmemixed modelrrepeated measures

The data

Suppose we have a dataset d with two between-subject factors (i.e., groups), group and condition, and two within-subject factors (i.e., repeated-measures factors), topic and problem (I uploaded the data to pastebin, so everybody should be able to obtain it):

> d <- read.table(url("http://pastebin.com/raw.php?i=4hRFyaRj"), colClasses = c(rep("factor", 6), "numeric"))
> str(d)
'data.frame':   2928 obs. of  6 variables:
  $ code     : Factor w/ 183 levels "A03U","A08C",..: 1 1 1 1 1 1 1 1 1 1 ...
  $ group    : Factor w/ 2 levels "control","experimental": 2 2 2 2 2 2 2 2 2 2 ...
  $ condition: Factor w/ 3 levels "alternatives",..: 3 3 3 3 3 3 3 3 3 3 ...
  $ topic    : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 2 2 2 2 3 3 ...
  $ problem  : Factor w/ 4 levels "AC","DA","MP",..: 3 4 1 2 3 4 1 2 3 4 ...
  $ mean     : num  94.5 94.5 86.5 84.5 80 46.5 73.5 43.5 51 39 ...

The data is from a behavioral experiment in which participants in six groups (2 levels of group times 3 levels of condition) worked on 16 tasks (for each of 4 topics 4 different problems). Allocation of participants to group/condition was fully random. Presentation of tasks was random insofar that problem was blocked within topic (i.e., for each topic all problems where presented sequentially), but order of problem and topic was random.
Update: The factor identifying the participant (in which topic and problem are nested) is code.

The Problem

How can I fit this dataset using lme?
(Sidenote: I would also consider using lme4, but I am kind of afraid of not having p-values, if there is something easily digestible as p-values, I would also consider lme4 an option).

So far I managed to fit an lme model with only one within-subject factor, but not two (see below).

What I tried

I can fit an lme model if I have just one within-subject factor:

require(nlme)
 m1 <- lme(mean ~ condition*group*problem, random = ~1|code/problem, 
           data = d, subset = topic == "1")

anova(m1)
                        numDF denDF F-value p-value
(Intercept)                 1   531   12101  <.0001
condition                   2   177      31  <.0001
group                       1   177       2  0.2178
problem                     3   531      35  <.0001
condition:group             2   177       1  0.3672
condition:problem           6   531      24  <.0001
group:problem               3   531       1  0.2180
condition:group:problem     6   531       2  0.0281

This (especially the df) nicely correspond with the results from an standard ANOVA (using
ez):

require(ez)
ezANOVA(subset(d, topic == "1"), dv = .(mean), wid = .(code), between = .(condition, group), within = .(problem))$ANOVA

Warning: Data is unbalanced (unequal N per group). Make sure you specified a well-considered value for the type argument to ezANOVA().
                   Effect DFn DFd     F                             p p<.05     ges
2               condition   2 177 30.69 0.000000000003611248905859672     * 0.13079
3                   group   1 177  1.53 0.217821969825403999321267179       0.00374
5                 problem   3 531 34.85 0.000000000000000000014254103     * 0.10028
4         condition:group   2 177  1.01 0.367225806638525886782531416       0.00492
6       condition:problem   6 531 24.40 0.000000000000000000000000142     * 0.13503
7           group:problem   3 531  1.48 0.217959293081550348203379031       0.00472
8 condition:group:problem   6 531  2.38 0.028119961573665430004664856     * 0.01499

Trying to fit this data with two within-subject factors in lme fails (either per code, or per dfs):

m2 <- lme(mean ~ condition*group*problem*topic, random = ~1|code/(problem*topic), data = d)
# fails: Error in getGroups.data.frame(dataMix, groups) : 
#  Invalid formula for groups

m3 <- lme(mean ~ condition*group*problem*topic, random = ~1|code/problem/topic, data = d)
# the next model takes some time (probably already an indicator, that it is the wrong model)
# and produces wrong denominator df!

# with both factors as ANOVA
m4 <- ezANOVA(d, dv = .(mean), wid = .(code), between = .(condition, group), within = .(problem, topic))

#effects are the same
all(row.names(anova(m3))[-1] == m4$ANOVA$Effect)

#denominator dfs are not:
anova(m3)$denDF[-1] == m4$ANOVA$DFd

# only for effects with topic:
row.names(anova(m3))[-1][!(anova(m3)$denDF[-1] == m4$ANOVA$DFd)]

UPDATE: As the precise error or nesting is somewhat unclear I here provide the equivalent aov call (this is the "standard" model via aov), which matches the results from ezANOVA. The critical error term is Error(code/(problem*topic)):

m5 <- aov(mean ~ (condition*group*problem*topic) + Error(code/(problem*topic)), d)
summary(m5)

Best Answer

I found an answer to my question on this thread: Repeated measures ANOVA with lme in R for two within-subject factors (somehow this thread was already one of my favorites, I must have forgotten about it). The specification is a little unhandy.

m6 <- lme(mean ~ condition*group*problem*topic, 
   random = list(code=pdBlocked(list(~1, pdIdent(~problem-1), pdIdent(~topic-1)))), data = d)
anova(m6)

However, the denominator dfs are still wrong, as noted in the thread and apparent in comparisons between the ANOVA and lme dfs.

data.frame(effect = rownames(anova(m6)), denDf= anova(m6)$denDF)

m4$ANOVA[,c("Effect", "DFd")]

As long as there are no other ideas, I think I will need to do the analysis in lme4, for which I wil need to post another question.