Consider the following data from a two-way within subjects design:
df <- "http://personality-project.org/r/datasets/R.appendix4.data"
df <- read.table(df,header=T)
head(df)
Observation Subject Task Valence Recall
1 1 Jim Free Neg 8
2 2 Jim Free Neu 9
3 3 Jim Free Pos 5
4 4 Jim Cued Neg 7
5 5 Jim Cued Neu 9
6 6 Jim Cued Pos 10
I would like to analyze this using mixed-linear models. Considering all possible fixed- and random-effects there are multiple possible models:
# different fixed effects with random-intercept
a0 <- lmer(Recall~1 + (1|Subject), REML=F,df)
a1 <- lmer(Recall~Task + (1|Subject), REML=F,df)
a2 <- lmer(Recall~Valence + (1|Subject), REML=F,df)
a3 <- lmer(Recall~Task+Valence + (1|Subject), REML=F,df)
a4 <- lmer(Recall~Task*Valence + (1|Subject), REML=F,df)
# different fixed effects with random-intercept-random-slope
b0 <- lmer(Recall~1 + (1|Subject) + (0+Task|Subject) + (0+Valence|Subject), REML=F,df)
b1 <- lmer(Recall~Task + (1|Subject) + (0+Task|Subject) + (0+Valence|Subject), REML=F,df)
b2 <- lmer(Recall~Valence + (1|Subject) + (0+Task|Subject) + (0+Valence|Subject), REML=F,df)
b3 <- lmer(Recall~Task+Valence + (1|Subject) + (0+Task|Subject) + (0+Valence|Subject), REML=F,df)
b4 <- lmer(Recall~Task*Valence + (1|Subject) + (0+Task|Subject) + (0+Valence|Subject), REML=F,df)
# different fixed effects with random-intercept-random-slope including variance-covariance matrix
c0 <- lmer(Recall~1 + (1 + Valence + Task|Subject), REML=F,df)
c1 <- lmer(Recall~Task + (1 + Valence + Task|Subject), REML=F,df)
c2 <- lmer(Recall~Valence + (1 + Valence + Task|Subject), REML=F,df)
c3 <- lmer(Recall~Task+Valence + (1 + Valence + Task|Subject), REML=F,df)
c4 <- lmer(Recall~Task*Valence + (1 + Valence + Task|Subject), REML=F,df)
-
What is the recommended way to select the best fitting model in this context? When using log-likelihood ratio tests what is the recommended procedure? Generating models upwards (from null model to most complex model) or downwards (from most complex model to null model)? Stepwise inclusion or exclusion? Or is it recommended to put all models in one log-likelihood ratio test and select the model with the lowest p-value? How to compare models that are not nested?
-
Is it recommended to first find the appropriate fixed-effects structure and then the appropriate random-effects structure or the other way round (I have found references for both options…)?
-
What is the recommended way of reporting results? Reporting the p-value from the log-likelihood ratio test comparing the full mixed-model (with the effect in question) to reduced model (without the effect in question). Or is it better to use log-likelihood ratio test to find the best fitting model and then use lmerTest to report p-values from the effects in the best fitting model?
Best Answer
I'm not sure there's really a canonical answer to this, but I'll give it a shot.
It depends what your goals are.
I don't think anyone knows. See previous answer about model selection more generally. If you could define your goals sufficiently clearly (which few people do), the question might be answerable. If you have references for both options, it would be useful to edit your question to include them ... (For what it's worth, this example (already quoted above) uses information criteria to select the random effects part, then eschews selection on the fixed-effect part of the model.
This is (alas) another difficult question. If you report the marginal effects as reported by
lmerTest
, you have to worry about marginality (e.g., whether the estimates of the main effects ofA
andB
are meaningful when there is anA
-by-B
interaction in the model); this is a huge can of worms, but is somewhat mitigated if you usecontrasts="sum"
as recommend byafex::mixed()
. Balanced designs help a little bit too. If you really want to paper over all these cracks, I think I would recommendafex::mixed
, which gives you output similar tolmerTest
, but tries to deal with these issues.