Solved – Specifying a linear mixed model in lmer with replications nested within a fully crossed design

lme4-nlmemany-categoriesmixed modelrrandom-effects-model

I’m trying to specify a linear mixed model for a somewhat complicated, nested & crossed method comparison study with replicated measurements. The goal is to partition and compare variances. It’s more complicated than the typical introductory examples I have seen and I’m not quite sure how to do it correctly.

The data I have collected comes from a comparison of different laboratory measurement methods (4 levels), each evaluated for all combinations for parameter1 (10 levels) and parameter2 (40 levels). Measurements (repeated for each unique combination of method, parameter1 and parameter2) were done on 60 different subjects characterized by sex. Each measurement was replicated 800 times (almost 80 million rows of observation) and the difference (true value determined beforehand by a reference method – measured value) was recorded.

Therefore: method x parameter1 x parameter2 are fully crossed (all combinations of factor levels occur once), subjects are nested within sex, and replicated measurements are nested within subject.

Following some advice (https://mailman.ucsd.edu/pipermail/ling-r-lang-l/2011-February/000225.html, and Barr et al., 2013), I would like to initially specify a maximal random effects structure for this, i.e., to account for between-subject differences in the sensitivity to all of the conditions, and then strip down the model from there. But I’m having a really hard time wrapping my head around the formula syntax and the model logic at the same time.

I’m shooting in the dark here, but what about:

(1 + method*param1*param2*sex | sex/subj_id)  # which expands to
(1 + method*param1*param2*sex | sex) + (1 + method*param1*param2*sex | sex:subj_id)

I don’t think that’s correct, though. It seems strange to have sex on both sides of the equation.

I would be very grateful for any help.

On a related issue: I intend to regress on the difference between measured value and (known) true value: diff ~ …. Is there any reason to instead regress the known true value onto the measured value: measured ~ true + …?

Best Answer

Since each subj_id is associated with one and only one value of sex, you need not mention sex in the random part of the model formula. The model using maximal random effects would be:

(1 + method*param1*param2 | subj_id)

You might have some difficulty estimating the random effects for this model because of the large number of levels of your factors (which implies a very large correlation matrix).

Also, I can't think of any reason why using difference scores instead of the raw pairs of scores as your DV would cause a problem.

Related Question