Mixed Model in R – Modeling Fixed Effects with Multiple Levels and Interactions

hypothesis testinginteractionmixed modelrregression

I am new to R and to mixed linear modelling. I have a dataset with variables from a cross-sectional study looking at fractional anisotrophy (a property of the brains white matter) in 6 different white matter fibre tracts in the brain. For each fibre tract there are 2 measures (one from each hemisphere). There are 66 participants divided into two groups. We want to control the group comparisons for differences in age, the average FA volume across the brain (wholebrain FA) and tract volume.

I assume that Group (Patients/Controls), Tract (CB/SLF1/SLF2/SLF3/UF/OFST) and Hemisphere (Left/Right) are fixed effects and that Subject (n=66) is a random effect. I also assume that Age, Wholebrain FA and Tract volume should be modelled as fixed effects. For Age and Wholebrain FA there is one value for each participant, but for Volume there is one value for each observation. The attached picture presents the table in the long format. For each subject there are 12 observations.

table

We hypothesized that there would be group difference in each of the six fibre tracts. We had no a priori assumptions about the hemispheres but would like to explore this post-hoc. We would also like to explore associations between Age and FA in different tracts.

My suggested model look like this

mixed.lmer <- lmer(FA ~ Age + Wholebrain_FA + Volume + GroupTractHemisphere + (1|Subjects), data = DTI)

Question 1: Given that Tract and Hemisphere are assumed to be fixed variables but also are within-subject variables, are they correctly modelled? I am having a hard time understanding how the model "understand" that these variables have multiple levels from the way it is written above.

Question 2: The Volume variable is a within-subject variable whereas Age is a between subject variable. Should they then not be modelled differently?

Question 3: Whether or not to include a three-way interaction is a major debate in my research group. Some saying that for practical purposes its impossible to really make sense of it. Other say it can guide the decision of whether or not to test differences between groups for each hemisphere in each tract. Including a three way interaction to the model is likely to change the results significantly so it seems pretty important to get it right the first time. Any thought on this? Is it being a criminal to include it?

Best Answer

Question 1: Given that Tract and Hemisphere are assumed to be fixed variables but also are within-subject variables, are they correctly modelled? I am having a hard time understanding how the model "understand" that these variables have multiple levels from the way its written above.

In most software, such as lme4 or GLMMadaptive it is not necessary to specify at which level a variable varies because, contrary to your understanding, the software really does "know". The level at which a variable varies is a property of the data and it is easy to demonstrate with cross-tabulations.

You may also want to allow a within-subject fixed effect to vary randomly across subjects in which case you can also specify it as a random slope. For example:

lmer(FA ~ Age + Wholebrain_FA + Volume + Hemisphere + (Hemisphere | Subjects)

will estimate a fixed effect for Hemisphere and also allow it to vary by subject. The software will estimate a variance for the "random slope" of Hemisphere.

The difference between the model without random slopes and with random slopes is that in the former, the "within-subject" variable is estimated to have a fixed effect which is the same for all subjects, whereas fitting random slopes allows each subject to have it's own effect of that variable (a global fixed effect and a random offset)

Question 2: The Volume variable is a within-subject variable whereas Age is a between subject variable. Should they then not be modelled differently?

Fixed effects are estimated in the same way regardless of whether they vary within levels of a grouping variable (Subject in your case). This means that the entries in the model matrix of fixed effects will be quite different for within vs. between variables, but this is not something you need to worry about. These kinds of concerns often arise when people come from a traditional ANOVA background.

Question 3: Whether or not to include a three-way interaction is a major debate in my research group. Some saying that for practical purposes its impossible to really make sense of it. Other say it can guide the decision of whether or not to test differences between groups for each hemisphere in each tract. Including a three way interaction to the model is likely to change the results significantly so it seems pretty important to get it right the first time. Any thought on this? Is it being a criminal to include it?

In general there is no problem in interpreting statistical interactions. They have a fairly simple interpretation. This question is too broad to answer. I would suggest posting a new question about this, and including as much detail as possible.

Related Question