My question in short
How to account for repeats of replicates of a full factorial design in lmer()
of the lme4
package in R?
Data structure
I am extracting a substance from different plants (A, B, C) under various conditions. Each condition parameter has two levels (high & low temperature, use of enzyme OE & PL, extraction method VW & MJ, solution AS & ES). The substance was extracted from each plant under each possible combination of condition parameters twice (true replicates of a full factorial design) and the yield of the substance was then determined for every extract twice as well (duplicate measurements). Consequently, an example of my data would look something like this:
tibble [192 x 8] (S3: tbl_df/tbl/data.frame)
$ extr.method: Factor w/ 2 levels "MJ","VW": 2 2 2 2 2 2 2 1 2 1 ...
$ enzym : Factor w/ 2 levels "OE","PL": 1 1 1 1 1 1 1 1 1 1 ...
$ temp : Factor w/ 2 levels "20","35": 1 1 1 1 1 1 2 1 2 1 ...
$ solution : Factor w/ 2 levels "AS","ES": 2 2 1 1 1 1 2 2 2 2 ...
$ plant : chr [1:192] "A" "A" "A" "A" ...
$ replicate : num [1:192] 2 2 2 2 1 1 2 2 2 2 ...
$ repeat : num [1:192] 1 2 1 2 2 1 1 1 2 2 ...
$ yield : num [1:192] 2.26 2.26 2.38 2.38 2.41 ...
My question
I am struggling to define the fixed and random effects in my model using the lmer()
function of lme4
in R.
In [this][1] post a similar topic is discussed, as OP also had distinct groups (plants in my case) and replicates. If I also only had replicates, I believe an accurate representation of my data structure in `lmer would be:
lmer(yield ~ enzym*temp*solution +
(1 + enzym + temp + solution|plant/replicate),
data=df, REML=FALSE)
But I don't know how to define repeats as random effects as well and if I need to do this? Since I'm new to mixed models I'm also not 100% confident about my code in general, so if you have suggestions for improvement, I'd appreciate them!
Side questions:
-
I realized that my output is different depending on me defining temp as a factor (high&low) or leaving it as a numerical. Why is this the case and what is the correct approach in my case?
-
If I run the code as shown above I get the message
boundary (singular) fit: see help('isSingular')
From what I've read, this does not necessarily mean that the model is bad or wrong, just that there are some effects, which are close to zero. Am I correct, that I can still use this model, despite the message? What would I need to do to prevent such a message?
Some data
Here is a part of my data set:
structure(list(extr.method = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L,
1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L,
2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L), levels = c("MJ",
"VW"), class = "factor"), enzym = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("OE",
"PL"), class = "factor"), temp = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L,
2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L,
2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), levels = c("20",
"35"), class = "factor"), solution = structure(c(2L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), levels = c("AS",
"ES"), class = "factor"), plant = c("A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C"), replicate = c(2, 2, 2, 2, 1, 1, 2,
2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2, 2,
1, 1, 1, 1, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1,
2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2,
1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2,
2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 1, 2, 1, 2, 1,
2, 2, 2, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 2, 1,
1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2,
2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1,
1, 1, 1, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 2, 2, 1, 1), `repeat` = c(1,
2, 1, 2, 2, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2,
1, 2, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1,
2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 2, 1,
1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1,
2, 1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 2,
1, 2, 1, 2, 2, 1, 2, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 2, 1,
1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 2, 1, 2, 1, 2,
1, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 1,
2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 1, 2, 2, 2, 1,
2, 1), yield = c(2.2641840150101, 2.2641840150101, 2.38224337438327,
2.38224337438327, 2.40724214205281, 2.5322359804005, 2.61893098155703,
2.62758334659476, 2.68814990185887, 2.6968022668966, 2.74006409208525,
2.7833259172739, 2.84100835085877, 2.85542895925498, 2.98809855650018,
3.06885396351899, 3.08220886913035, 3.09470825296511, 3.20440768244343,
3.20440768244343, 3.51300870212247, 3.57069113570734, 3.61030783614934,
3.61030783614934, 3.65718052552973, 3.66030537148842, 3.67280475532319,
3.7071780608688, 4.09241320133092, 4.10523151990534, 4.2462330242239,
4.2462330242239, 4.37441620996805, 4.44491696212733, 4.44491696212733,
4.45773528070175, 4.52182687357382, 4.54105435143544, 4.63078258145635,
4.63078258145635, 4.66650577018733, 4.75400145703072, 4.81649837620456,
4.81649837620456, 5.7301184285578, 5.85511226690549, 5.89338696103622,
5.89338696103622, 5.98311519105712, 6.03913097336182, 6.08566173965244,
6.21967762875293, 6.27175839473113, 6.3411994160354, 6.60507529699164,
6.60507529699164, 6.6189635012525, 6.63285170551335, 6.76131759492626,
6.76131759492626, 7.05644193546942, 7.26476499938223, 7.57030549312104,
7.65363471868616, 3.93697458897124, 3.93697458897124, 4.51379892481992,
4.52533541153689, 4.66954149549906, 4.68107798221603, 5.07331853059313,
5.17714691104589, 5.24636583134773, 5.29251177821562, 5.61553340629088,
5.67321583987574, 5.94143915604538, 5.95706715112725, 6.04526753649814,
6.26446078412063, 6.26446078412063, 6.30704989850078, 6.33204866617032,
6.86952217106539, 9.78500345052529, 10.1407368206686, 10.307374962136,
10.4224720260985, 10.6724597027939, 10.7349566219677, 10.8099529249764,
10.8099529249764, 10.9354725722823, 11.0252008023032, 11.0818145233826,
11.1818095940607, 11.2474313591933, 11.3224276622019, 12.1067639978337,
12.1935905403612, 12.21922717751, 12.2993416686001, 12.3121599871745,
12.373047000403, 12.4115019561262, 12.4531614914931, 12.4916164472163,
12.4942448967115, 16.2236594, 16.456921, 16.5598712, 16.8523694,
17.6080051226537, 17.7329989610014, 17.9934027908924, 18.03569785,
18.1123591, 18.2156140590661, 18.3698721, 18.4596352, 18.6183719826309,
18.6183719826309, 19.0662665700434, 19.0662665700434, 19.7710929362818,
20.1460744513249, 20.2849564939334, 20.3821739237594, 30.2659745,
30.3321568, 30.2636974, 30.0026587, 22.5698745, 22.9875311, 22.3698541,
22.2254136, 13.2942054621367, 15.0235383239827, 15.2542680583222,
15.8397447592086, 15.8397447592086, 16.211796455831, 16.211796455831,
16.5059768671138, 16.5751957874156, 16.9097539022079, 17.0712647162455,
17.0828012029625, 17.385633979283, 17.3942863443207, 17.4440008952801,
17.5932907401885, 19.506399228017, 19.87463098847, 19.9784593689228,
20.3813560964508, 22.5411782095576, 22.7302484085302, 22.8968865499976,
23.2045261957836, 23.217344514358, 23.2718223682993, 23.3134819036661,
23.3647551779638, 23.7461001555526, 23.8262146466427, 23.8614650227223,
24.0569443809822, 25.1717449511261, 25.6842196883516, 25.8342122943689,
25.8467116782036, 26.1716956579076, 26.3716857992639, 26.8341630011504,
26.8466623849852, 26.9810307612089, 27.1685215187305, 28.4309592860421,
28.6934463465723, 31.3399670855737, 31.5899547622691, 32.2288121582684,
32.3260295880943, 32.3399177923552, 32.4232470179203, 32.4232470179203,
32.6315700818331, 32.7218434095287, 32.9440546777024, 33.8190115461362,
34.2078812654401)), row.names = c(NA, -192L), class = c("tbl_df",
"tbl", "data.frame"))
Best Answer
Some quick thoughts:
You're still going to have trouble with the resulting random-effects term
(1 + enzym + temp + solution|plant)
, because (while it is the technically correct maximal model), it allows not just for the variation in effects across plants, but also in covariation between any pair of effects, for example: "do plants which have a higher-than-average yield in the (OE/20/MJ) condition tend to have a lower-than-average yield in the (PL/35/VW) condition?". This means you'll have to estimate the elements of an 8×8 covariance matrix (a total of (8+1)*8/2 = 36 parameters) from measurements of only 24 plants ...This is a difficult problem, and is why you're getting singular fits. Singular fits aren't inherently bad, but suggest that your model may be too complex for the data (!) (There is more discussion in the help page that the message points to, as well as in the GLMM FAQ.)
(1|plant/enzym/temp/solution)
: this estimates only four variance parameters, but makes two stronger assumptions, (1) compound symmetry (all the correlations between pairs at the same level, e.g. between different enzyme and temperature combinations, are identical) and (2) nestedness (i.e. the only variation in temperature effects is within enzyme treatments)(1|plant) + (1|plant:enzym) + (1|plant:temp) + (1|plant:solution)
. This assumes independent variation across plants for each effect.Because there are only two distinct values, treating
temp
as numeric or factor shouldn't make any difference to the overall fit of the model, but it will change the units of the estimate (from "difference between levels" to "difference per degree C") and, more importantly, it will change the reference level for the other effects to 0°C — probably not sensible. It is probably easiest to treat temperature as categorical and divide the estimate by 5 if you want effects per degree. (If you are going to try to estimate main effects or compute 'type 3' sums of squares/p-values, you should also setoptions(contrasts = c("contr.sum", "contr.poly"))
to get sum-to-zero rather than treatment contrasts).More on randomized complete block designs here and here, but they don't go much beyond the case with a single treatment evaluated within a block.
Murtaugh, Paul A. “Simplicity and Complexity in Ecological Data Analysis.” Ecology 88, no. 1 (2007): 56–62.