Solved – Controlling for continuous variable in linear-mixed-effect model

lme4-nlmemixed modelrregression

I have a dataset df describing repeated scores (variable = V5) for each subject (coded in V1). Subjects belong to different groups (variable = V7), and have different age (variable = V6). The dataset is composed of 6 variables:

V1: categorical variable, representing subject ID
V2: continuous variable
V4: factor with 75 levels
V5: dependent variable
V6: continuous variable (age)
V7: factor with two levels (groups)

I am fitting a linear mixed effect model in R from the nlme package (lme() function).

My goal is to estimate the effect of V7 and V4 and their interaction in predicting V5. I am using the linear mixed effect model to add V1 as random effect with the following syntax:

my_lme = lme(V5 ~ V4*V7, data=df, random = ~ 1 | V1, na.action=na.omit)

However, I am interested in first of all regressing out the effect of V6 before applying the linear mixed effect model. Which is the most reasonable approach to do so? Would be ok to:

  • First, regressing out V6 from V5 by applying a linear model, e.g.
    my_lm = lm(V5 ~ V6, data = df)
  • Then apply the linear mixed effect model on the my_lm$residuals after putting them on my dataset naming them as V9, hence:
    my_lme = lme(V9 ~ V4*V7, data=df, random = ~ 1 | V1, na.action=na.omit)

Is this a correct approach in your expert opinion? I am sort of a newby.

I have a second question:
If I then want to additionally estimate the effect of the continuous variable V2 on the prediction of V5 in my model, would the following syntax be correct?
my_lme = lme(V9 ~ V4*V7 + V2, data=df, random = ~ 1 | V1, na.action=na.omit)

Thank you in advance.

Best Answer

Turning my comments into an answer.

To estimate the parameters you're interested in, there's no need to fit models, extract residuals and run new models. It's easier and better to specify all of the terms together in one model:

#Using lme4 instead of nlme here because I'm more familiar with the syntax
library(lme4)
lmer(V5 ~ V4 * V7 + V6 + V2 + (1|V1))

However, based on your questions I would recommend that you read more about linear models (and mixed models in particular) so you understand exactly what each of those parameters is doing. For example, your question about age and subject ID is hard to answer unless we know more about subject ID. I've assumed that individual subjects were measured multiple times; if so, the model above should work well (though you could consider whether random slopes are also needed). If individuals were measured only once, then you don't need a random effect at all and the model would simply be:

lm(V5 ~ V4 * V7 + V6 + V2)

Perhaps more importantly, this is assuming no nonlinearity in the effect of any of the continuous parameters. If there are such nonlinearities, you will need to modify the model.