Solved – Controlling for continuous variable in linear-mixed-effect model

lme4-nlmemixed modelrregression

I have a dataset df describing repeated scores (variable = V5) for each subject (coded in V1). Subjects belong to different groups (variable = V7), and have different age (variable = V6). The dataset is composed of 6 variables:

V1: categorical variable, representing subject ID
V2: continuous variable
V4: factor with 75 levels
V5: dependent variable
V6: continuous variable (age)
V7: factor with two levels (groups)

I am fitting a linear mixed effect model in R from the nlme package (lme() function).

My goal is to estimate the effect of V7 and V4 and their interaction in predicting V5. I am using the linear mixed effect model to add V1 as random effect with the following syntax:

my_lme = lme(V5 ~ V4*V7, data=df, random = ~ 1 | V1, na.action=na.omit)

However, I am interested in first of all regressing out the effect of V6 before applying the linear mixed effect model. Which is the most reasonable approach to do so? Would be ok to:

First, regressing out V6 from V5 by applying a linear model, e.g.
my_lm = lm(V5 ~ V6, data = df)
Then apply the linear mixed effect model on the my_lm$residuals after putting them on my dataset naming them as V9, hence:
my_lme = lme(V9 ~ V4*V7, data=df, random = ~ 1 | V1, na.action=na.omit)

Is this a correct approach in your expert opinion? I am sort of a newby.

I have a second question:
If I then want to additionally estimate the effect of the continuous variable V2 on the prediction of V5 in my model, would the following syntax be correct?
my_lme = lme(V9 ~ V4*V7 + V2, data=df, random = ~ 1 | V1, na.action=na.omit)

Thank you in advance.

Best Answer

Turning my comments into an answer.

To estimate the parameters you're interested in, there's no need to fit models, extract residuals and run new models. It's easier and better to specify all of the terms together in one model:

#Using lme4 instead of nlme here because I'm more familiar with the syntax
library(lme4)
lmer(V5 ~ V4 * V7 + V6 + V2 + (1|V1))

However, based on your questions I would recommend that you read more about linear models (and mixed models in particular) so you understand exactly what each of those parameters is doing. For example, your question about age and subject ID is hard to answer unless we know more about subject ID. I've assumed that individual subjects were measured multiple times; if so, the model above should work well (though you could consider whether random slopes are also needed). If individuals were measured only once, then you don't need a random effect at all and the model would simply be:

lm(V5 ~ V4 * V7 + V6 + V2)

Perhaps more importantly, this is assuming no nonlinearity in the effect of any of the continuous parameters. If there are such nonlinearities, you will need to modify the model.

Related Solutions

Solved – Comparing a mixed model (subject as random effect) to a simple linear model (subject as a fixed effect)

This is to add to @ocram's answer because it is too long to post as a comment. I would treat A ~ B + C as your null model so you can assess the statistical significance of a D-level random intercept in a nested model setup. As ocram pointed out, regularity conditions are violated when $H_0: \sigma^2 = 0$, and the likelihood ratio test statistic (LRT) will not necessarily be asymptotically distributed $\chi^2$. The solution was I taught was to bootstrap the LRT (whose bootstrap distribution will likely not be $\chi^2$) parametrically and compute a bootstrap p-value like this:

library(lme4)
my_modelB <- lm(formula = A ~ B + C)
lme_model <- lmer(y ~ B + C + (1|D), data=my_data, REML=F)
lrt.observed <- as.numeric(2*(logLik(lme_model) - logLik(my_modelB)))
nsim <- 999
lrt.sim <- numeric(nsim)
for (i in 1:nsim) {
    y <- unlist(simulate(mymodlB))
    nullmod <- lm(y ~ B + C)
    altmod <- lmer(y ~ B + C + (1|D), data=my_data, REML=F)
    lrt.sim[i] <- as.numeric(2*(logLik(altmod) - logLik(nullmod)))
}
mean(lrt.sim > lrt.observed) #pvalue

The proportion of bootstrapped LRTs more extreme that the observed LRT is the p-value.

Paired t-test – Special Case of Linear Mixed-Effect Modeling

The equivalence of the models can be observed by calculating the correlation between two observations from the same individual, as follows:

As in your notation, let $Y_{ij} = \mu + \alpha_i + \beta_j + \epsilon_{ij}$, where $\beta_j \sim N(0, \sigma_p^2)$ and $\epsilon_{ij} \sim N(0, \sigma^2)$. Then $Cov(y_{ik}, y_{jk}) = Cov(\mu + \alpha_i + \beta_k + \epsilon_{ik}, \mu + \alpha_j + \beta_k + \epsilon_{jk}) = Cov(\beta_k, \beta_k) = \sigma_p^2$, because all other terms are independent or fixed, and $Var(y_{ik}) = Var(y_{jk}) = \sigma_p^2 + \sigma^2$, so the correlation is $\sigma_p^2/(\sigma_p^2 + \sigma^2)$.

Note that the models however are not quite equivalent as the random effect model forces the correlation to be positive. The CS model and the t-test/anova model do not.

EDIT: There are two other differences as well. First, the CS and random effect models assume normality for the random effect, but the t-test/anova model does not. Secondly, the CS and random effect models are fit using maximum likelihood, while the anova is fit using mean squares; when everything is balanced they will agree, but not necessarily in more complex situations. Finally, I'd be wary of using F/df/p values from the various fits as measures of how much the models agree; see Doug Bates's famous screed on df's for more details. (END EDIT)

The problem with your R code is that you're not specifying the correlation structure properly. You need to use gls with the corCompSymm correlation structure.

Generate data so that there is a subject effect:

set.seed(5)
x <- rnorm(10)
x1<-x+rnorm(10)
x2<-x+1 + rnorm(10)
myDat <- data.frame(c(x1,x2), c(rep("x1", 10), rep("x2", 10)), 
                    rep(paste("S", seq(1,10), sep=""), 2))
names(myDat) <- c("y", "x", "subj")

Then here's how you'd fit the random effects and the compound symmetry models.

library(nlme)
fm1 <- lme(y ~ x, random=~1|subj, data=myDat)
fm2 <- gls(y ~ x, correlation=corCompSymm(form=~1|subj), data=myDat)

The standard errors from the random effects model are:

m1.varp <- 0.5453527^2
m1.vare <- 1.084408^2

And the correlation and residual variance from the CS model is:

m2.rho <- 0.2018595
m2.var <- 1.213816^2

And they're equal to what is expected:

> m1.varp/(m1.varp+m1.vare)
[1] 0.2018594
> sqrt(m1.varp + m1.vare)
[1] 1.213816

Other correlation structures are usually not fit with random effects but simply by specifying the desired structure; one common exception is the AR(1) + random effect model, which has a random effect and AR(1) correlation between observations on the same random effect.

EDIT2: When I fit the three options, I get exactly the same results except that gls doesn't try to guess the df for the term of interest.

> summary(fm1)
...
Fixed effects: y ~ x 
                 Value Std.Error DF   t-value p-value
(Intercept) -0.5611156 0.3838423  9 -1.461839  0.1778
xx2          2.0772757 0.4849618  9  4.283380  0.0020

> summary(fm2)
...
                 Value Std.Error   t-value p-value
(Intercept) -0.5611156 0.3838423 -1.461839  0.1610
xx2          2.0772757 0.4849618  4.283380  0.0004

> m1 <- lm(y~ x + subj, data=myDat)
> summary(m1)
...
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -0.3154     0.8042  -0.392  0.70403   
xx2           2.0773     0.4850   4.283  0.00204 **

(The intercept is different here because with the default coding, it's not the mean of all subjects but instead the mean of the first subject.)

It's also of interest to note that the newer lme4 package gives the same results but doesn't even try to compute a p-value.

> mm1 <- lmer(y ~ x + (1|subj), data=myDat)
> summary(mm1)
...
            Estimate Std. Error t value
(Intercept)  -0.5611     0.3838  -1.462
xx2           2.0773     0.4850   4.283

Best Answer

Related Solutions

Solved – Comparing a mixed model (subject as random effect) to a simple linear model (subject as a fixed effect)

Paired t-test – Special Case of Linear Mixed-Effect Modeling

Related Question