Solved – Interactions between levels in lme4

interactionlme4-nlmemultilevel-analysisr

We are implementing multilevel models in lme4 and have a question about how to handle cross-level predictors. This is a psychology experiment where individual participants come into the lab and complete multiple trials of the same task (e.g., judging how much they like a picture). To describe our dataset, we have trials nested within participants. These trials also have a trial-level predictor (e.g., how happy the participant rated they were before they made the judgment), and we might be interested in the relationship between happiness and liking (both rated on a 1-7 scale and treated as a linear variable). Modeling this with a random intercept for participant would be:

lmer(liking~happiness + (1|participant), data)

Now, in these data we also have three distinct races completing the experiment (e.g., participants that self-identify as white-only, black-only, or hispanic-only). Each participant only belongs to 1 race, and each race contains multiple participants.

We hypothesize that trial-level happiness will interact with participant-level race to predict liking. To test this model, we believe lme4 will detect that race is a group-level factor (since only one value exists for each participant) and that we would run:

lmer(liking~happiness*race + (1|participant), data)

However, based on other reading, we're wondering if this should instead be treated as a nested or random slope. For instance, should we instead use:

lmer(liking~happiness*race + (1| race/participant), data)

lmer(liking~happiness*race + (1 + happiness | race/participant), data)

Again, we are interested in the interaction between race and happiness in predicting liking, and each participant only belongs to one race. Thank you in advance for your help!

PS: We have looked at Specifying Cross-Level Interactions in LMER but this seems to represent a different data structure.

Best Answer

The general principle here is that it only makes sense/is only possible to estimate within-level variation for factors that actually vary within that level in the course of the experiment/observation period. Since happiness might vary within individuals across trials, but race can't, the maximal model you can fit would be

lmer(liking~happiness*race + (happiness|participant), data)

In other words, the effect of happiness on liking may vary across individuals; you will get estimates for

(fixed) average effect of happiness on liking for the "baseline" race (whichever is the first level of your factor)
(fixed) average effect of race (differences in liking from the baseline race) on liking for happiness zero (you might want to center happiness so that the 'zero' level of happiness is a more meaningful level [e.g. a baseline level of 4, or a baseline level of the mean happiness across the study population]
(fixed) race-happiness interaction (the average difference in the happiness-liking slope between the baseline and other races)
(random) among-individual intercept variation (difference in the expected liking of an individual at the baseline happiness from the expected liking for an individual of their race at the baseline happiness)
(random) among-individual happiness-slope variation (difference in the expected effect of 1 unit of liking on happiness from the average effect for an individual of their race)

You could change the interpretation of the fixed factors slightly by changing contrasts.

To stretch a point a little bit, it might be possible in principle that the effect of race, or the happiness by race interaction, could vary across individuals, but you can't measure it. (This discussion would make more sense if considered in terms of a characteristic that is more likely to vary within individuals over some reasonable time scale but doesn't vary within individuals within the scope of the experiment.)

Related Solutions

r – Correct Specification of Longitudinal Model in lme4 for R

I'll imagine a concrete example, with more context, to make things easy. Assume you measure the score on test of 3k students of 200 schools and you measured each student at 4 time points (say, at each quarter). You have a covariate at student level that doesn't vary by time (like sex), that you called pred1.obs and a covariate by school that vary by time (say the number of meetings between teachers and parents until that moment in time). If this example resembles your study, than I think you have to set up a three level model (individual level, group level and time level for the groups): i = 1 ... 3000 individuals t = 1... 4 periods g = 1... 200 groups

The model would be:

y_i ~ N(a + b_[groups_g] + b.ind*pred.obs1_i, sigma^2) # 1st level
b_g = N(gamma + gamma_[time] + gamma.g[time_t]*pred2.grp, sigma.b^2) # 2nd level
gamma.g_t = N(0, sigma.gamma^2) # 3rd level

Note that you would have the slope at the second level (group level) varying by time, which makes sense, since you expect that the effect of schools on the perfomance of students may vary by time, depending of the valu of the covariate at the level of schools. I'm not that sure how to estimate this with lmer (I know how to estimate a Bayesian model using WinBugs or Jags, calling them by R). In any case, here is my suggestion.

In lme4, I'd try: First, expand pred2.grp (the covariate at group level that vary by time) to the individual level, then you would have repetead measures by individuals at the group and time level. Then:

lmer(outcome ~ pred1.obs + pred2.grp + (1|group))

R – Understanding GLMM: Between, Within, and Nested Data Structures

I don't see why Rating is nested within ID. This means that every every unique Rating belongs to one and only one ID which does not appear to be the case. If Rating is to be treated as random, then these should be crossed random effects. See the answer here for the difference between crossed and nested random effects and how to specify them.

That said, however, Rating should not be specified as a random intercept here because 1) there are only 4 levels, which is insufficient and 2) it is a likert scale item so the assumption of normality of random effects is hardly likely to hold.

So a better model would be

model = glmer(Correct ~ (1|ID) + Memory * State * Rating, data, family=binomial)

Lastly, note that there a several ways that you can treat Rating as an independent variable, see here, here and here for more details.

Best Answer

Related Solutions

r – Correct Specification of Longitudinal Model in lme4 for R

R – Understanding GLMM: Between, Within, and Nested Data Structures

Related Question