I know that there are already a host of questions about nested designs but many of them haven't been answered or come from biological domains which I sometimes find hard to transfer to my domain.
I am currently trying to analyse the data from a psychological study. Participants read four statements and indicated how appropriate they found each on a 5-point Likert-scale. But I'll just call the outcome variable "out" and so the content will not distract from the statistical problem at hand.
There are two IVs, one that was varied on three levels between subjects (let's call it "between" and the levels 1, 2 and 3) and one on two levels within subjects ("within" with its levels a and b). The within subjects factor is realised in such a manner that two of the four statements correspond to one of the levels and the other two to the other.
Correspondingly, for every participant, I now have four "out" data points, two for the a statements and two for the b statements, like this (in long format):
subj between within item out
123 1 a a1 3
123 1 a a2 4
123 1 b b1 1
123 1 b b2 2
124 2 a a1 5
124 2 a a2 4
124 2 b b1 2
124 2 b b2 3
125 3 a a1 1
125 3 a a2 1
125 3 b b1 2
125 3 b b2 3
and so on
What is the right way to analyse the effect and interaction of between and within? I assume that the two different a and the two different b statements do not differ systematically but are just two instances of the same thing. Or do I need an additional factor that tells R which of the four statements it is in order to allow for that error? Anyway, I have tried these options:
m1 <- aov(out~between*within+Error(subj/within))
summary(m1)
m2 <- lm(out~between*within+within/subj)
anova(m2)
m3 <- lmer(out~between*within+(between|subj))
Anova(m3)
They produce different results but I'm not entirely sure about which one is right or what the differences are. Could anyone enlighten me on this? I assume this is about fixed and random effects which always gets my head in a twist. I have read other posts about nested designs and I think this one comes nearest. But unfortunately the question ID being nested groups wasn't answered. Any help would be very much appreciated!
Edit:
I greatly appreciate the answers so far! The comments prompted me to read some more tutorials on mixed effects models as well as answers to similar questions. This enables me to clarify my question: I know that I will need to specify random intercepts at least for subjects and probably also the four different items representing the two levels of the within factors. However, I am unsure about the specific fixed/random effects structure I would have to model, and specifically what the maximal model in my case would look like.
Right now, I have tried some code which roughly looks like this:
m1 <- lmer(out ~ between * within + (between|subj) + (within|subj) + (between|within/item))
This is probably wrong on several levels so any feedback would be greatly appreciated! Just for clarity: The item factor has no intrinsic meaning, but both levels of the within factor is realised by two items each (which every participant sees) and I am unsure whether this is relevant to the model.
Apart from the model specification, I am still getting confused about the concepts nested and crossed and which one applies to my setup. Lastly, most of my models fail to converge but that is probably a matter for another question (and has been discussed here in length.)
Best Answer
As discussed in the comments, here is my Stan model for an Ordered Logistic Regression. It takes long format data (one observation per row) and estimates the following parameters: coefficients for dummy-coded predictors (you can for example use Rs
model.matrix
) from the matrixpreds
as fixed effects, and also "maximal" (intercept + slope for all terms plus all interaction) random subject and item effects. It does not estimate or disallow correlated random effects.It should give you the coefficients for your predictor matrix (in beta) and the cutoff points.
After you've specified all values asked for in the
data
part of the code in a listdat
, you can run it, e.g. viafit <- stan(model_code = code, data=dat, chains = 0, iter = 10)
, and print the values viaprint(fit,"beta")
.I can't promise this works as intended, but I've compared it with simple ordinal regression using
clmm
and "maximal" linear regression usinglme4
and the results are qualitatively very similar!