This mainly pertains to your first question.
You want to specify a "Dx interaction with any of the above variables", and seem to think this is done with $Dx*var$. But it is not, $Dx*var$ expands to $Dx + var + Dx:var$. The $Dx:var$ specifies the interaction.
So let's change your model to
Correct ~ Dx + No_of_Stim + Trial_Type + Probe_Loc + Dist +
Dx:No_of_Stim + Dx:Trial_Type + Dx:Probe_Loc + Dx:Dist +
(1 | Trial) + (1 | SubID)
(I'm not sure what lmer made out of your model specification.)
So this fits a model with the fixed predictors you asked for, plus it allows for a individual intercept for each Trial, and additionally for each SubID.
Because I want to also examine a continuous variable, the total Cartesian Distance > between stimuli per trial (divided by number of stimuli to control for varying > numbers), I have opted to use a mixed linear model with repeated measures.
I don't get that. So you would like to include a continuous predictor - why does that make it a repeated measures design?
Do you measure the same subject on the same tasks more than once? On differing tasks more than once?
3) If I attempt to reduce the model, I should favor the model with the lower REML, yes?
There are various ways to compare two models. The simplest is to set REML=FALSE for both, and compare them via a likelihood ratio test by anova(mod1, mod2). Another way is to use information criteria such as the AIC (Akaike information criterion) or BIC (Bayesian information criterion). Those try to balance model fit with model complexity, and a lower number indicates a "better" model. I don't think it's a good idea to use the REML criterion as a model comparison metric, afaik this is, at least in general, not valid (perhaps someone more knowledgeable can provide an explanation for that).
Update after ADDENDUM
Remove $(1 | Trial)$ from the random effects specification (unless you intend to model temporal correlation between trial runs; but even then you would need something other than the running trial index number as a grouping variable). It's pointless to include this as a grouping variable, since you only have one observation per level:
Correct ~ Dx + No_of_Stim + Trial_Type + Probe_Loc + Dist +
Dx:No_of_Stim + Dx:Trial_Type + Dx:Probe_Loc + Dx:Dist +
(1 | SubID)
This has the minimal random effects structure justified by your experiment: It fits a model with the fixed effects predictors, and adds an additional "random" (read: individual) intercept per individual subject, effectively shifting the regression line determined by the fixed effects predictors up and down by a subject-specific amount. The fact that subjects were repeatedly measured is thereby accounted for.
You can build upon this model significantly. The maximal random effects structure would allow for correlated random intercepts and slopes of all within-subject predictors, grouped by subject. Barr et al., (2013) recommend, for confirmatory hypothesis testing of fixed effects, to include for all tested fixed effects correlated random intercepts and slopes per subject.
Presuming you intend to test all your within-subjects variables number of stimuli, trial type, probe location, distance, and omit any interactions between them, this would be
Correct ~ Dx + No_of_Stim + Trial_Type + Probe_Loc + Dist +
Dx:No_of_Stim + Dx:Trial_Type + Dx:Probe_Loc + Dx:Dist +
(No_of_Stim + Trial_Type + Probe_Loc + Dist | SubID)
where intercepts and slopes for No_of_Stim, Trial_Type, Probe_Loc, and Dist are allowed to correlate. A simpler model, imposing independence between intercept and slope, would be
Correct ~ Dx + No_of_Stim + Trial_Type + Probe_Loc + Dist +
Dx:No_of_Stim + Dx:Trial_Type + Dx:Probe_Loc + Dx:Dist +
(1 | SubID) + (0 + No_of_Stim | SubID) + (0 + Trial_Type | SubID) +
(0 + Probe_Loc | SubID) + (0 + Dist | SubID))
Relevant google terms for finding "the best" model are model building, top-down strategy, bottom-up strategy, but there is a lot of controversy regarding the best strategy.
Regarding interpretation of the coefficients in the output
By default, R uses treatment contrasts (see $help(contr.treatment)$ and http://talklab.psy.gla.ac.uk/tvw/catpred/). This means that one level of a factor is chosen as the reference factor, and the coefficients indicate the contrast to this reference level.
If you are instead asking about the interpretation of the coefficients of fixed within-subject predictors that are additionally allowed to vary randomly per subject: It's essentially the same as in a simple linear model. They tell you about whether there's any systematic effect across subjects of this predictor on the outcome, after accounting for random variation by subjects.
You need to use a logistic regression model!
The elephant in the room, so to speak, is that you are trying to predict a binary outcome: correct/incorrect. You need to use a logistic mixed regression model (try $glmer$ and look at http://www.ats.ucla.edu/stat/r/dae/melogit.htm).
So, I hope this helped you a bit. Maybe also have a look at http://glmm.wikidot.com/faq and R's lmer cheat-sheet.
(I am by no means an expert on this, and in a state of learning as well, so if there's something to disagree, please do so freely.)
Before jumping into a (partial, non-rigorous) explanation, let me point out that all of your denominator dfs are so large that there should be no practical difference between them. For any df greater than about 50, there's little difference between the $t$ distribution and the corresponding Normal distribution, or between the $F$ distribution and the corresponding (scaled) $\chi^2$ distribution -- unless you are doing something like estimating the 99.999% confidence intervals. For example, qnorm(0.75)
(the upper tail cutoff of a symmetric 95% interval for a standard Normal) is 1.959964, while qt(0.975,df=1000)
is 1.962339; changing the df to 10000 will hardly affect the result. (The comparison for $F$ vs. $\chi^2$ is almost identical, but slightly more annoying to compute because of the difference in scaling.)
You can play with lme
to get the classical estimate of the degrees of freedom (be aware that lme
's df-calculation heuristic can fail badly for random-slope models ...)
dd <- expand.grid(subj=1:67,stim=1:50,dur=1:3)
dd$y <- rnorm(nrow(dd))
library(nlme)
anova(lme(y~1,random=~1|subj/dur/stim,data=dd))
## numDF denDF F-value p-value
## (Intercept) 1 9849 2.520658 0.1124
The df value here is 67*3*49 ... one important thing you haven't told us is what fixed effects you are actually testing, and at what level they vary. I would guess that they don't actually vary the lowest level, so that in the classical case you would essentially be testing over a denominator based on the subject $\times$ duration SS.
I think the fact that the lowest-level variance (cd:dur:subj
) is zero in the high-df model is actually a good clue. No variance among stimuli within duration essentially means that the responses are more less independent, so you get closer to the full possible df for the model ...
It can't be a coincidence that the other df you are getting are near 67*27=1809, but I'm not sure where the 27 would come from ...
I would welcome more rigorous/better-informed answers, especially about what the df would be in the classical case ...
Best Answer
I would say
would probably be a little better. (The simpler
(1|duration:subject)
model is not necessarily wrong, but might be oversimplified. If I were a peer reviewer of this work I would certainly ask for a justification of the simpler model ...) The(duration|subject)
model is a "random-slopes" model, more or less (although if you have codedduration
as a categorical (factor or ordered factor) variable the thing that varies randomly among subjects is not a slope per se, but a between-duration difference). The specification you have ((1|subject:duration)
) assumes all subject-duration effects are drawn from a single (iid) Normal distribution;(duration|subject)
assumes that the duration effects for a single individual are drawn from a $3 \times 3$ multivariate Normal distribution.More precisely: comparing the random effect specification
(1|subject:duration)
gives the model for the conditional modes/BLUPs of subject $s$ for duration $d$ (or duration effect $d$, depending on how the model is parameterized) $$ b_{sd} \sim \textrm{Normal}(0,\sigma_{sd}^2) $$ whereas(duration|subject)
gives$$ \begin{split} b_{s\cdot} & \sim \textrm{MVN}( \mathbf 0,\Sigma) \\ \Sigma & = \left( \begin{array}{ccc} \sigma^2_1 & \sigma_{12} & \sigma_{13} \\ \sigma_{12} & \sigma^2_{2} & \sigma_{23} \\ \sigma_{13} & \sigma_{23} & \sigma^2_3 \\ \end{array} \right) \end{split} $$ i.e., the different duration levels each have different among-subjects variances, and the among-subject variation in different duration levels is correlated ($\Sigma$ is a general symmetric positive (semi)definite matrix). To get back to the previous model you would need to restrict $\sigma_1^2=\sigma_2^2=\sigma_3^2=\sigma_{sd}^2$ and all of the off-diagonal elements would be zero.