mixed-model – Random Effects in Linear Mixed Models (LME) with Repeated Replicates

lme4-nlmemixed modelrandom-effects-modelrepeated measures

My question in short

How to account for repeats of replicates of a full factorial design in lmer() of the lme4 package in R?

Data structure

I am extracting a substance from different plants (A, B, C) under various conditions. Each condition parameter has two levels (high & low temperature, use of enzyme OE & PL, extraction method VW & MJ, solution AS & ES). The substance was extracted from each plant under each possible combination of condition parameters twice (true replicates of a full factorial design) and the yield of the substance was then determined for every extract twice as well (duplicate measurements). Consequently, an example of my data would look something like this:

tibble [192 x 8] (S3: tbl_df/tbl/data.frame)
 $ extr.method: Factor w/ 2 levels "MJ","VW": 2 2 2 2 2 2 2 1 2 1 ...
 $ enzym      : Factor w/ 2 levels "OE","PL": 1 1 1 1 1 1 1 1 1 1 ...
 $ temp       : Factor w/ 2 levels "20","35": 1 1 1 1 1 1 2 1 2 1 ...
 $ solution   : Factor w/ 2 levels "AS","ES": 2 2 1 1 1 1 2 2 2 2 ...
 $ plant      : chr [1:192] "A" "A" "A" "A" ...
 $ replicate  : num [1:192] 2 2 2 2 1 1 2 2 2 2 ...
 $ repeat     : num [1:192] 1 2 1 2 2 1 1 1 2 2 ...
 $ yield      : num [1:192] 2.26 2.26 2.38 2.38 2.41 ...

My question

I am struggling to define the fixed and random effects in my model using the lmer() function of lme4 in R.

In [this][1] post a similar topic is discussed, as OP also had distinct groups (plants in my case) and replicates. If I also only had replicates, I believe an accurate representation of my data structure in `lmer would be:

lmer(yield ~ enzym*temp*solution + 
   (1 + enzym + temp + solution|plant/replicate), 
   data=df, REML=FALSE)

But I don't know how to define repeats as random effects as well and if I need to do this? Since I'm new to mixed models I'm also not 100% confident about my code in general, so if you have suggestions for improvement, I'd appreciate them!

Side questions:

I realized that my output is different depending on me defining temp as a factor (high&low) or leaving it as a numerical. Why is this the case and what is the correct approach in my case?
If I run the code as shown above I get the message

boundary (singular) fit: see help('isSingular')

From what I've read, this does not necessarily mean that the model is bad or wrong, just that there are some effects, which are close to zero. Am I correct, that I can still use this model, despite the message? What would I need to do to prevent such a message?

Some data

Here is a part of my data set:

structure(list(extr.method = structure(c(2L, 2L, 2L, 2L, 2L, 
2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 
1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 
2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L), levels = c("MJ", 
"VW"), class = "factor"), enzym = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("OE", 
"PL"), class = "factor"), temp = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 
2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 
2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 
1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 
2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), levels = c("20", 
"35"), class = "factor"), solution = structure(c(2L, 2L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), levels = c("AS", 
"ES"), class = "factor"), plant = c("A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", 
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", 
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", 
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", 
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", 
"C", "C", "C", "C", "C"), replicate = c(2, 2, 2, 2, 1, 1, 2, 
2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2, 2, 
1, 1, 1, 1, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 
2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 
1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 
2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 1, 2, 1, 2, 1, 
2, 2, 2, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 2, 1, 
1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 
2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 
1, 1, 1, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 2, 2, 1, 1), `repeat` = c(1, 
2, 1, 2, 2, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2, 
1, 2, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1, 
2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 2, 1, 
1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 
2, 1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 2, 
1, 2, 1, 2, 2, 1, 2, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 2, 1, 
1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 2, 1, 2, 1, 2, 
1, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 1, 
2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 1, 2, 2, 2, 1, 
2, 1), yield = c(2.2641840150101, 2.2641840150101, 2.38224337438327, 
2.38224337438327, 2.40724214205281, 2.5322359804005, 2.61893098155703, 
2.62758334659476, 2.68814990185887, 2.6968022668966, 2.74006409208525, 
2.7833259172739, 2.84100835085877, 2.85542895925498, 2.98809855650018, 
3.06885396351899, 3.08220886913035, 3.09470825296511, 3.20440768244343, 
3.20440768244343, 3.51300870212247, 3.57069113570734, 3.61030783614934, 
3.61030783614934, 3.65718052552973, 3.66030537148842, 3.67280475532319, 
3.7071780608688, 4.09241320133092, 4.10523151990534, 4.2462330242239, 
4.2462330242239, 4.37441620996805, 4.44491696212733, 4.44491696212733, 
4.45773528070175, 4.52182687357382, 4.54105435143544, 4.63078258145635, 
4.63078258145635, 4.66650577018733, 4.75400145703072, 4.81649837620456, 
4.81649837620456, 5.7301184285578, 5.85511226690549, 5.89338696103622, 
5.89338696103622, 5.98311519105712, 6.03913097336182, 6.08566173965244, 
6.21967762875293, 6.27175839473113, 6.3411994160354, 6.60507529699164, 
6.60507529699164, 6.6189635012525, 6.63285170551335, 6.76131759492626, 
6.76131759492626, 7.05644193546942, 7.26476499938223, 7.57030549312104, 
7.65363471868616, 3.93697458897124, 3.93697458897124, 4.51379892481992, 
4.52533541153689, 4.66954149549906, 4.68107798221603, 5.07331853059313, 
5.17714691104589, 5.24636583134773, 5.29251177821562, 5.61553340629088, 
5.67321583987574, 5.94143915604538, 5.95706715112725, 6.04526753649814, 
6.26446078412063, 6.26446078412063, 6.30704989850078, 6.33204866617032, 
6.86952217106539, 9.78500345052529, 10.1407368206686, 10.307374962136, 
10.4224720260985, 10.6724597027939, 10.7349566219677, 10.8099529249764, 
10.8099529249764, 10.9354725722823, 11.0252008023032, 11.0818145233826, 
11.1818095940607, 11.2474313591933, 11.3224276622019, 12.1067639978337, 
12.1935905403612, 12.21922717751, 12.2993416686001, 12.3121599871745, 
12.373047000403, 12.4115019561262, 12.4531614914931, 12.4916164472163, 
12.4942448967115, 16.2236594, 16.456921, 16.5598712, 16.8523694, 
17.6080051226537, 17.7329989610014, 17.9934027908924, 18.03569785, 
18.1123591, 18.2156140590661, 18.3698721, 18.4596352, 18.6183719826309, 
18.6183719826309, 19.0662665700434, 19.0662665700434, 19.7710929362818, 
20.1460744513249, 20.2849564939334, 20.3821739237594, 30.2659745, 
30.3321568, 30.2636974, 30.0026587, 22.5698745, 22.9875311, 22.3698541, 
22.2254136, 13.2942054621367, 15.0235383239827, 15.2542680583222, 
15.8397447592086, 15.8397447592086, 16.211796455831, 16.211796455831, 
16.5059768671138, 16.5751957874156, 16.9097539022079, 17.0712647162455, 
17.0828012029625, 17.385633979283, 17.3942863443207, 17.4440008952801, 
17.5932907401885, 19.506399228017, 19.87463098847, 19.9784593689228, 
20.3813560964508, 22.5411782095576, 22.7302484085302, 22.8968865499976, 
23.2045261957836, 23.217344514358, 23.2718223682993, 23.3134819036661, 
23.3647551779638, 23.7461001555526, 23.8262146466427, 23.8614650227223, 
24.0569443809822, 25.1717449511261, 25.6842196883516, 25.8342122943689, 
25.8467116782036, 26.1716956579076, 26.3716857992639, 26.8341630011504, 
26.8466623849852, 26.9810307612089, 27.1685215187305, 28.4309592860421, 
28.6934463465723, 31.3399670855737, 31.5899547622691, 32.2288121582684, 
32.3260295880943, 32.3399177923552, 32.4232470179203, 32.4232470179203, 
32.6315700818331, 32.7218434095287, 32.9440546777024, 33.8190115461362, 
34.2078812654401)), row.names = c(NA, -192L), class = c("tbl_df", 
"tbl", "data.frame"))

Best Answer

Some quick thoughts:

my guess it that the multiple extractions and measurements (replicates) are technical replicates, i.e. they're there to improve precision and make sure that the process is working reasonably well. It might make sense to quantify the variability at this level (researchers in your area might be interested), but you can do this directly rather than as part of your mixed model analysis. It will make your analysis easier/less error-prone if you simply average all four extraction/replicate combinations within each (enzyme × temperature × method × solution × plant) combination — your technical replication will still be serving its purposes of reducing the variation at the level of these means (see Murtaugh 2007 for an analogous argument).

You're still going to have trouble with the resulting random-effects term (1 + enzym + temp + solution|plant), because (while it is the technically correct maximal model), it allows not just for the variation in effects across plants, but also in covariation between any pair of effects, for example: "do plants which have a higher-than-average yield in the (OE/20/MJ) condition tend to have a lower-than-average yield in the (PL/35/VW) condition?". This means you'll have to estimate the elements of an 8×8 covariance matrix (a total of (8+1)*8/2 = 36 parameters) from measurements of only 24 plants ...

This is a difficult problem, and is why you're getting singular fits. Singular fits aren't inherently bad, but suggest that your model may be too complex for the data (!) (There is more discussion in the help page that the message points to, as well as in the GLMM FAQ.)

One possible shortcut: fit a nested model (1|plant/enzym/temp/solution): this estimates only four variance parameters, but makes two stronger assumptions, (1) compound symmetry (all the correlations between pairs at the same level, e.g. between different enzyme and temperature combinations, are identical) and (2) nestedness (i.e. the only variation in temperature effects is within enzyme treatments)
Another (different but equally parsimonious) shortcut: (1|plant) + (1|plant:enzym) + (1|plant:temp) + (1|plant:solution). This assumes independent variation across plants for each effect.

Because there are only two distinct values, treating temp as numeric or factor shouldn't make any difference to the overall fit of the model, but it will change the units of the estimate (from "difference between levels" to "difference per degree C") and, more importantly, it will change the reference level for the other effects to 0°C — probably not sensible. It is probably easiest to treat temperature as categorical and divide the estimate by 5 if you want effects per degree. (If you are going to try to estimate main effects or compute 'type 3' sums of squares/p-values, you should also set options(contrasts = c("contr.sum", "contr.poly")) to get sum-to-zero rather than treatment contrasts).

More on randomized complete block designs here and here, but they don't go much beyond the case with a single treatment evaluated within a block.

Murtaugh, Paul A. “Simplicity and Complexity in Ecological Data Analysis.” Ecology 88, no. 1 (2007): 56–62.

My question in short

Data structure

My question

Some data

Best Answer

Related Solutions

Solved – Linear mixed model with lme – time as fixed categorical variable and random slope

Solved – How to model repeated measure + random effects with SPSS mixed

Related Question