Linear Mixed Effects Model – Implementation of 4-Level Repeated Measures

lme4-nlmemixed modelmultilevel-analysisregressionrepeated measures

Prior Searches

Prior to reading below, I have searched this forum for other posts of this nature. I have a found a few, but I have not found them to be helpful to my specific challenge. Particularly, Dr. Bolker's response to this post was helpful in understanding a double repeated measures LMM. However, it did not help me understand how to tackle this challenge as my repeated measures are nested in both the subjects and their groups

Intro

I'm conducting a linear fixed effects model on multilevel data (Groups:Subjects) and two repeated measurements (Condition and Study Session). Each group and their respective subjects are measured over multiple data collection sessions (Session_ID). Each data collection session consists of three study conditions with the same group and their respective subjects. I have 2 group level measurements (e.g., group performance) and two subject level measurements (e.g., subject demographics). My subsequent DV is also measured at the group level. The problem I am facing here is that I do not know how to account for the repeated measure nature of the data set for both my data colleciton session and my counterbalanced condition while considering the nested structure of my data. I've attempted to replicate my code, but it is beyond my knowledge of R. Instead, I am providing a dput() of a toy data set that replicates my actual data set structure and will further explain my actual data set below.

Data

My variables are as follows:

tibble [54 x 9] (S3: tbl_df/tbl/data.frame)
 $ Group         : Factor w/ 3 levels "1","2","3": Grouping Factor
 $ Subj          : Factor w/ 3 levels "1","2","3": Subject IDs, nested in Group
 $ CounterBalance: Factor w/ 3 levels "A","B","C": Counterbalance of first repeated measure (A = Condition A)
 $ Session_ID    : Factor w/ 2 levels "1","2": Measurement Session (3+ in real data). Each session consits of each counterbalance condition
 $ Group_IV1     : num [1:54] IV measured at the group level. Hence ach subj has the same value for the group
 $ Group_IV2     : num [1:54] Same as above but second IV
 $ Subj_IV1      : num [1:54] IV measured at the subject level. Each counterbalance has the same value
 $ Subj_IV2      : num [1:54] Same as above for second subject level ID
 $ Group_DV      : num [1:54] Group level DV. This DV was measured at the group level

Thus, as an example, my data looks like this:

   Group Subj  CounterBalance Session_ID Group_IV1 Group_IV2    Subj_IV1 Subj_IV2 Group_DV
   <fct> <fct> <fct>          <fct>          <dbl>     <dbl>      <dbl>    <dbl>    <dbl>
    1     1     A              1                 44         5       6        5       0.676
    1     1     B              1                 25        24       6        5       0.242
    1     1     C              1                 28        28       6        5       0.406
    1     2     A              1                 44         5       6        6       0.676
    1     2     B              1                 25        24       6        6       0.242
    1     2     C              1                 28        28       6        6       0.406
    1     3     A              1                 44         5       5        5       0.676
    1     3     B              1                 25        24       5        5       0.242
    1     3     C              1                 28        28       5        5       0.406
    2     1     A              1                 15        221      6        6       0.122
    2     1     B              1                 26        251      6        6       0.690
    2     1     C              1                 29        211      6        6       1.70 
    2     2     A              1                 15        221      6        6       0.122
    2     2     B              1                 26        251      6        6       0.690
    2     2     C              1                 29        211      6        6       1.70 
    2     3     A              1                 15        221      6        6       0.122
    2     3     B              1                 26        251      6        6       0.690
    2     3     C              1                 29        211      6        6       1.70 

*Notice the repeating data based on group, counterbalance, subj.

I am attempting to specify this type of model, fully, using a linear mixed model approach. However, I'm having a difficult time finding any material or sources that specify how to parameterize a multilevel model with two nested repeated measurements. Below, I will describe my though process that I have taken and try to explain where I'm confused.

Approach

So far, my approach has been to nest each repeated measure into their respective level and then nest each subj into their groups. This approach was derived from the logic presented in Raudenbush (2004) which states that repeated measures are nested within subjects. Thus, If I extend this approach to my hypothesized data structure I would say that my counterbalanced conditions (level 1) are nested within my subjects (level 2) who are nested within my data collection sessions (level 3) which are then nested within the groups (level 4):

CounterBalanced Condition > Subjects > Data Collection Session > Groups.

Thus, my syntax, I believe, would be:

lmer(Group_DV1 ~ Group_IV1 + Group_IV2 + SubjIV1 + Subj_IV2 + (1|Group) + (1|Group:Session_ID) + (1|Group:Session_ID:Subj) + (1|Group:Session_ID:Subj:CounterBalance), data = data))

However, this model specification leads to the following error:

Error: number of levels of each grouping factor must be < number of observations (problems: Group:Session_ID:Subj:CounterBalance)

Which I'm assuming to mean that I am nesting too many variable and have more neted levels than observations. But, since this is how the data is setup, I'm not sure how that could be the case?

In removing the counterbalance from the random effects, I am able to successfully run the model

lmer(Group_DV1 ~ Group_IV1 + Group_IV2 + SubjIV1 + Subj_IV2 + (1|Group) + (1|Group:Session_ID) + (1|Group:Session_ID:Subj), data = data))

But, I then run into a singularity as my random effects report that my Group:Session_ID:Subj has a variance of 0. Is this because there truly is no variance between subjects when accounting for the nesting or is because this model is specified incorrectly?

boundary (singular) fit: see ?isSingular
Linear mixed model fit by REML ['lmerModLmerTest']
Formula: 
Group_DV ~ Group_IV1 + Group_IV2 + Subj_IV1 + Subj_IV2 + (1 |  
    Group) + (1 | Group:Session_ID) + (1 | Group:Session_ID:Subj)
   Data: data
REML criterion at convergence: 53.873
Random effects:
 Groups                Name        Std.Dev.
 Group:Session_ID:Subj (Intercept) 0.0000  
 Group:Session_ID      (Intercept) 0.2053  
 Group                 (Intercept) 0.3365  
 Residual                          0.2914  
Number of obs: 54, groups:  
Group:Session_ID:Subj, 18; Group:Session_ID, 6; Group, 3
Fixed Effects:
(Intercept)    Group_IV1    Group_IV2     Subj_IV1     Subj_IV2  
  -0.857832     0.046202     0.017655    -0.001838    -0.002266  
optimizer (nloptwrap) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings 

Any help in figuring out how to correctly specify this is appreciated.

*Replicatable Toy Dataset

My real dataset is roughly 1000 rows. I do not have the knowledge or skill level in R to replicate this data structure using code.

structure(list(Group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1", 
"2", "3"), class = "factor"), Subj = structure(c(1L, 1L, 1L, 
2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L), .Label = c("1", "2", "3"), class = "factor"), CounterBalance = structure(c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
    Session_ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L), .Label = c("1", "2"), class = "factor"), Group_IV1 = c(44, 
    25, 28, 44, 25, 28, 44, 25, 28, 15, 26, 29, 15, 26, 29, 15, 
    26, 29, 31, 20, 24, 31, 20, 24, 31, 20, 24, 30, 26, 21, 30, 
    26, 21, 30, 26, 21, 14, 18, 25, 14, 18, 25, 14, 18, 25, 27, 
    24, 26, 27, 24, 26, 27, 24, 26), Group_IV2 = c(5, 24, 28, 
    5, 24, 28, 5, 24, 28, 22, 25, 21, 22, 25, 21, 22, 25, 21, 
    0, 27, 28, 0, 27, 28, 0, 27, 28, 25, 26, 25, 25, 26, 25, 
    25, 26, 25, 19, 30, 21, 19, 30, 21, 19, 30, 21, 18, 24, 25, 
    18, 24, 25, 18, 24, 25), Subj_IV1 = c(6, 6, 6, 6, 6, 6, 5, 
    5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 6, 6, 6, 7, 7, 
    7, 5, 5, 5, 6, 6, 6, 5, 5, 5, 6, 6, 6, 5, 5, 5, 7, 7, 7, 
    6, 6, 6, 6, 6, 6, 6, 6, 6), Subj_IV2 = c(5, 5, 5, 6, 6, 6, 
    5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 
    5, 5, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 5, 5, 
    5, 7, 7, 7, 6, 6, 6, 6, 6, 6), Group_DV = c(0.67619, 0.24212, 
    0.40619, 0.67619, 0.24212, 0.40619, 0.67619, 0.24212, 0.40619, 
    0.1216, 0.68978, 1.70481, 0.1216, 0.68978, 1.70481, 0.1216, 
    0.68978, 1.70481, 0.34172, 0.00387, 0.92827, 0.34172, 0.00387, 
    0.92827, 0.34172, 0.00387, 0.92827, 0.48737, 0.66727, 0.78267, 
    0.48737, 0.66727, 0.78267, 0.48737, 0.66727, 0.78267, 0.8836, 
    0.82926, 1.23826, 0.8836, 0.82926, 1.23826, 0.8836, 0.82926, 
    1.23826, 0.58837, 0.53445, 0.87934, 0.58837, 0.53445, 0.87934, 
    0.58837, 0.53445, 0.87934)), row.names = c(NA, -54L), class = c("tbl_df", 
"tbl", "data.frame"))

References

Raudenbush, S., Bryk, A., Cheong, Y. F., Congdon, R., & Toit, M. D. (n.d.). Conceptual and Statistical Background for Hierarchical Multivariate Linear Models. In HLM 6: Hierarchical linear and nonlinear modeling. essay, Scientific Software International.

Best Answer

It is important to think about how many data points the model is trying to fit, and also to remember which variables are fixed effects factors, which variables are random effects, and which variables are numeric covariates.

Data points. If the dependent variable is measured at group level, then the amount of data to fit is one value for each group, multiplied by the number of repeated measures sessions and the number of test conditions within each session. The data for regression should have this many rows of data.

Fixed effects. Factors are fixed effects if they span the full range of values relevant to the study. In an analysis of performance under three different study conditions, the study conditions are a fixed effect, lets say Study_Type. If session is to be treated as a factor rather than an ordinal session number, then Session_ID is a fixed effect. Within a session containing multiple study conditions we might treat the measurement order as a factor, with Order=1 (first performance measurement), Order=2 (second measurement with the session), etc.

  • Roughly, a mixed effects model will estimate the mean value of the dependent variable for each fixed effect cell of the regression model. In this study we will be particularly interested in whether the mean differs by Study_Type.

Random effects. Factors are random effects if the levels included in the study are a subset of a wider potential population, and are meant to be representative of that wider population. If we have a set of groups of subjects that are meant to be roughly indicative of a bigger universe of groups, then Group_ID is a random factor.

  • Roughly, a mixed effects model estimates how much variance is added to the dependent variable by a random factor as a whole, without bickering about 'oo killed 'oo or which random group had higher or lower performance.

Covariates. In a regression model that includes some categorical predictors (like Study_Type) and some numerical predictor variables (like Group_IV1), the numerical predictors may be called covariates. In this study some numerical predictors were measured at the level of subjects, with multiple subjects within each group. Each of these (Subj_IV1, Subj_IV2) can be aggregated to obtain a single composite measure for the group. I'll call these Avg_IV1, Avg_IV2.

Design and model

The experimental design has

  • three fixed effects - Study_Type, Order, Session_ID
  • one random effect - Group
  • four covariates - Group_IV1, Group_IV2, Avg_IV1, Avg_IV2
  • one dependent variable - Group_DV

Fixed effects. For sure we want to know how the dependent variable is affected by Study_Type. But the effect of Study_Type might vary across sessions (e.g. if novelty wanes as the experiment drags on, or if subjects learn to learn as they acquire more experience with each session). The effect of Study_Type might also vary depending on the order of measurement within each session. So we probably want to model interactions as well as main effects of the fixed effects.

~ Study_Type * Order * Session_ID

Random effect. Some groups might be better than others at everything. We could model this with a simple random effect:

~ (1|Group_ID)

But more than that, group differences might vary across the different study conditions, which we can model by allowing the "slope" of the group differences to vary with Study_Type:

~ (1+Study_Type|Group_ID)

In principle it is possible that group differences might further vary across the different sessions, or measurement orders, and so forth. If those nuisance effects are substantial, we would do well to include them in the model in order to more clearly measure the effects of interest (like the effect of Study_Type). But if some of the nuisance effects are probably small and we have only a finite amount of data because we tested a limited number of groups, then we might omit the nuisance effects from the model for the sake of simplicity and practicality. Here, I will opt for simplicity and leave out potential interactions of the nuisance fixed effects with groups.

Covariates. We want to test hypotheses about potential effects of the covariates (numerical predictors) on the dependent measure. A simple set of hypotheses would be that the effect of each covariate is the same regardless of what the other covariates are doing (that is, we assume the covariates do not interact with each other). Then our model can just include simple terms for the main effects of the covariates:

~ Group_IV1 + Group_IV2 + Avg_IV1 + Avg_IV2

Possibly the effect of a covariate could be attenuated or enhanced as the study proceeds across sessions. We might wonder if we need to treat session as an ordinal variable rather than a categorial factor. But if there are only two or three sessions we might forge ahead with the categorical Session_ID factor, and consider including interactions of Session_ID with the covariates:

~ Group_IV1 + Group_IV1:Session_ID + Group_IV2 + Group_IV2:Session_ID + ...

Similarly, the effect of a covariate might vary depending on Study_Type (or equivalently, the effect of some Study_Type might be enhanced or attenuated as a function of one of the covariates). In that case our model might want to include terms for interactions between covariates and Study_Type.

Putting it all together.

A plausible model (leaving out potential interactions between covariates and the fixed effects) might be:

Group_DV ~ Group_IV1 + Group_IV2 + Avg_IV1 + Avg_IV2
+ Study_Type * Order * Session_ID
+ (1+Study_Type|Group_ID)
Related Question