Solved – Nesting random effect within fixed effect using lmer() of lme4 in R

lme4-nlmemixed modelrrandom-effects-model

Problem
I want to fit a model using the R lme4 lmer function, and I'm not sure how to specify a random effect that is nested within a fixed effect.

Setup
I am applying a Treatment (fixed effect) to a subject, after which s/he is prompted to speak a word that uses exactly one of the 4 mandarin tones (Tone effect, fixed). Their response time, RT, is measured as the response variable.

Each tone is expressed 6 times, once via use of each of 6 different, predetermined words (Word effect, random). There is no overlap of words between tones (i.e. each word uses precisely one of the 4 tones, so there are a total of 24 words used).

Question
Below is the syntax I was thinking of for my model. Is this the right way to specify the described nesting of Word within Tone? In particular, do I need to specify the association Tone:Word as the random variable, or should I simply use Word by itself?

RT ~ 1 + Treatment*Tone + (1+ Treatment*Tone | Tone:Word)

Best Answer

I'm assuming that you have multiple subjects, that each subject gets exactly one treatment, and that each subject gets multiple words and tones (every subject getting all of the words and all of the tones is the cleanest design, but a somewhat unbalanced design will lose power but not mess things up too badly)
There is no way to determine whether the effect of tone varies across words, since each word uses a single tone, but you can tell whether the effect of tone varies across subject, since each subject gets multiple tones
The opposite situation holds for treatments: each word is observed under multiple treatments, but each subject is observed under a single treatment
Since (assuming) there is a single measurement per Subject-Word combination, you don't need/shouldn't use a (1|Subject:Word) term - it will be handled by the residual variance term
Since words are unique, you don't need to code word as explicitly nested within tone - see this answer
Since response time is an intrinsically positive variable you might want to consider log(RT) (for a linear model), or a GLMM with a Gamma response distribution (this depends on your data, though - the conditional distribution might be adequately Normal/homoscedastic)

Thus I think the maximal model (allowing for all interactions that can be estimated from the design) is

RT ~ Treatment*Tone+(Treatment|Word)+(Tone|Subject)

(you can include the implicit intercept terms, e.g. (1+Treatment|Word), if it's clearer for you, but you'll get the same model either way)

Note that there is some controversy about whether you should start with the maximal model, or whether you should try to cut the model down to something that is reasonable given the size of the experiment (in this case, the number of subjects is important - the variance-covariance matrix for tone effects is 4x4 (10 parameters), if you had fewer than 50 subjects you might not want to try that ...) Barr et al 2013 believe you should go with the maximal model, Bates et al. think you shouldn't ...

Related Solutions

R – Using lmer for Repeated-Measures Linear Mixed-Effect Model

I think that your approach is correct. Model m1 specifies a separate intercept for each subject. Model m2 adds a separate slope for each subject. Your slope is across days as subjects only participate in one treatment group. If you write model m2 as follows it's more obvious that you model a separate intercept and slope for each subject

m2 <- lmer(Obs ~ Treatment * Day + (1+Day|Subject), mydata)

This is equivalent to:

m2 <- lmer(Obs ~ Treatment + Day + Treatment:Day + (1+Day|Subject), mydata)

I.e. the main effects of treatment, day and the interaction between the two.

I think that you don't need to worry about nesting as long as you don't repeat subject ID's within treatment groups. Which model is correct, really depends on your research question. Is there reason to believe that subjects' slopes vary in addition to the treatment effect? You could run both models and compare them with anova(m1,m2) to see if the data supports either one.

I'm not sure what you want to express with model m3? The nesting syntax uses a /, e.g. (1|group/subgroup).

I don't think that you need to worry about autocorrelation with such a small number of time points.

Best Answer

Related Solutions

R – Using lmer for Repeated-Measures Linear Mixed-Effect Model

Related Question