Solved – Stepwise introduction of predictors to mixed-effects models

mixed modelrstepwise regression

As the title says, what I'd like to do is stepwise introduction of predictor variables to a mixed-effects model. I'm going to first say what I'd be doing if it were stepwise linear regression, just to make sure I've got that part right, and then describe the full model to which I want to apply an analogous approach.

I have a student population who took a pretest, then a tutorial, then a posttest. The tutorial involved doing problems from several categories with feedback, and the users could control which category the next problem would come from and when to stop the tutorial.

I want to create a model that will account for posttest performance using pretest score and some measures of behavior during the tutorial, including total number of problems done, accuracy, and probability of switching category. The last of these is of greatest theoretical interest. There are other variables I'm not mentioning for simplicity.

For the linear regression approach, I first did a simple regression using posttest score as the DV and including the main effects (only) of pretest score, tutorial accuracy, and number of problems as predictors. Then, I added probability of switching as an additional predictor, and compared the resulting model to the previous one to see if it had significantly better explanatory power (it did). The R code I used is below.

lm1 <- lm( posttestScore ~ pretestScore + practiceAccuracy + practiceNumTrials, data=subj.data )
lm2 <- lm( posttestScore ~ pretestScore + practiceAccuracy + practiceNumTrials + probCategorySame, data=subj.data )
anova( lm1, lm2 )

So far so good? OK, next, I switched to a mixed model in order to include a binary within-subjects factor, 'test question type'. Both pretest and posttest have values for each level of this factor for every subject. (It's unrelated to the 'problem category' I mentioned for the tutorial.) The other predictors, however, only have one value for each participant. My models then became:

library( nlme )
lm1 <- lme( posttestScore ~ pretestScore + questionType + practiceAccuracy + practiceNumTrials, random=~1|sid, method="REML", data=D )
lm2 <- lme( posttestScore ~ pretestScore + questionType + practiceAccuracy + practiceNumTrials + probCategorySame, random=~1|sid, method="REML", data=D )

However, I don't know how to test whether the second model resulted in a significant improvement over the first model. Is that the right question I should be asking and, if so, how should I do it?

Best Answer

You can also perform a likelihood-ratio test (LRT) with mixed models. The command is also anova(model1, model2). But there are a few things you have to consider. To use the anova command after lme that has been fitted by REML (Restricted maximum likelihood; the default), the models have to include the same fixed effects and both have to be fitted via REML. If you want to calculate a likelihood-ratio test for models with different fixed effects, you have to fit the models via maximum likelihood (ML). You can do that by specifying the option method="ML" within the lme command. After that, you can just type anova(lme.mod1, lme.mod2) to calculate the LR test. If the LR test is significant, you have evidence that the model including probCategorySame is an improvement over the model without that variable.

As an alternative, you can also simulate the likelihood-ratio test with the following syntax (lm1 is the simpler model and lm2 is the alternative model):

sim.lme <- simulate(lm1, nsim=1000, m2=lm2, method="ML")
plot(sim.lme)

This produces a graphic like that

Simulation of p-value for an lme-model

The key is that the simulated empirical $p$-values should be roughly the same as the nominal $p$-values (the blue line should be diagonal). If that is the case, you can use the nomial $p$-value from the likelihood-ratio test.

In their book Mixed-Effects Models in S and S-Plus (page 87-92), Pinheiro and Bates discourage the use of likelihood-ratio test for assessing the significance of fixed effects. They recommend the conditional $F$-test instead, which you can use with the anova command with a single model. In your case, that would be anova(lm2).

Related Question