Solved – Mixed models and backward elimination

lme4-nlmemixed modelmodel selection

Let's say I have a data like this, and I'm trying to build a mixed model.

studentId   | courseId | courseName | year | courseGroup | timespent | count | mark
stud1       | 19       | M101       | 2008 | F           | 12.3      | 23    | 3.7
stud1       | 21       | E102       | 2008 | C           | 2.3       | 15    | 4
stud1       | 109      | H300       | 2008 | E           | 22.3      | 5     | 3
stud2       | 19       | M101       | 2008 | F           | 3.3       | 45    | 3
stud2       | 21       | E102       | 2008 | C           | 12.3      | 56    | 3.3
stud3       | 200      | M101       | 2009 | F           | 12.3      | 21    | 3.7

the full model would be:

lmer.model.full <- mark ~ courseGroup + timespent + count + courseGroup:(timespent+count) + (1|studentId) + (1|courseName/courseId)

Running the model results with the following summary:

Fixed effects:
                         Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)             3.909e+00  1.282e-01  8.089e+02  30.491  < 2e-16 ***
courseGroupE           -2.404e-01  1.835e-01  6.417e+02  -1.311  0.19049    
courseGroupF           -1.105e-01  1.493e-01  2.246e+02  -0.740  0.46008    
timespent              -1.552e-02  5.065e-02  1.872e+03  -0.306  0.75932    
count                  -5.244e-02  5.409e-02  1.869e+03  -0.969  0.33244    
courseGroupE:timespent  8.740e-02  5.184e-02  1.823e+03   1.686  0.09196 .  
courseGroupF:timespent  2.350e-03  3.992e-02  1.881e+03   0.059  0.95308    
courseGroupE:count     -6.546e-02  2.673e-02  1.158e+03  -2.449  0.01446 *  
courseGroupF:count     -7.015e-02  2.470e-02  1.373e+03  -2.840  0.00457 **

In order to proceed with the model selection using backward elimination, I should continue by removing the one with the highest p-value, which is timespent. But, the interaction between timespent and courseGroup is marginally significant, might become significant in later iterations. On the other hand, p-value for interaction between F group and timespent is 0.95. What should I do in this case and how to proceed further?

One more thing that puzzle me… what should I be doing in this case:

 Fixed effects:
                             Estimate Std. Error         df t value Pr(>|t|)    
    (Intercept)             3.909e+00  1.282e-01  8.089e+02  30.491  < 2e-16 ***
    courseGroupE           -2.404e-01  1.835e-01  6.417e+02  -1.311  0.16049    
    courseGroupF           -1.105e-01  1.493e-01  2.246e+02  -0.740  0.96008    
    timespent              -1.552e-02  5.065e-02  1.872e+03  -0.306  0.25932    
    count                  -5.244e-02  5.409e-02  1.869e+03  -0.969  0.33244    
    courseGroupE:timespent  8.740e-02  5.184e-02  1.823e+03   1.686  0.09196 .  
    courseGroupF:timespent  2.350e-03  3.992e-02  1.881e+03   0.059  0.25308    
    courseGroupE:count     -6.546e-02  2.673e-02  1.158e+03  -2.449  0.01446 *  
    courseGroupF:count     -7.015e-02  2.470e-02  1.373e+03  -2.840  0.00457 **

Should I remove courseGroup from the model, leaving only with timespent and count, or remove count? What would be the best indicator that courseGroup should be excluded from the model?

Best Answer

The $p$-values you are getting are based on Satterthwaite's approximation of degrees of freedom. This approximation is conducted because the degrees of freedom (DFs) in the context of a mixed-effects model are not obvious. Other people might advocate different approximations (eg. Kenward-Roger approximation); realistically speaking seeing the DFs you are looking at (200+ most of the time), the difference you will get from one approximation to another they will be quite insignificant. Your $p$-values will essentially be $z$-values as DFs >= 500. To state the obvious though: $p$-values are not the probability of making a mistake by rejecting a true null hypothesis. Therefore a small value does not mandate that the associated variable should be included.

Having said the above please see the following thread Algorithms for automatic model selection. You are exactly in the same situation. If you really want to go down the road of a stepwise (linear mixed-effects) regression please use AIC; this option will try to somewhat penalize your model's complexity. Some standard caveats about the use of AIC on mixed-effects model can be found in the FAQ of http://glmm.wikidot.com. You should not use AIC's "vanilla" version directly. Please look at Greven and Kneib, 2010 regrading this; they present a corrected cAIC. They also provide an R package implementing the corrected cAIC they outline.

In general I would suggest you look into cross-validation techniques and/or look at boostrapping your model. Check the functionality around lmer's bootMer, inspect the confidence intervals of your parameters; it should give you a better idea of what is worth including in your final model. I would argue that even jack-knifing is better than $p$-value selection.

Finally while I "get" the idea of variable selection in terms of fixed-effects, in terms of random-effects the whole idea appears to me as a simple data-dredging technique. The random effects are suppose to be based on the research question. Otherwise one simply cherry-picks an error structure trying to "squeeze more significance out of the remaining terms" (glmm.wikidot's FAQ once more).

Related Solutions

R Mixed Models – Using pvals.fnc() for p-values in lmer() while Handling Random Factor Correlations

(Italics represent corrected text)

You are making a 'mistake' in your model specification given what you say you want.

Random effects:
 Groups      Name        Variance  Std.Dev. Corr          
 Item       (Intercept)   273.508  16.5381               
 Subject     Gramgram        0.000   0.0000               
             Gramungram   3717.213  60.9689    NaN        
             Number1        59.361   7.7046    NaN -1.000

You see the numbers there under Corr for Subject? That shows that you are estimating correlations between the random slope of Gramgungram and Gramgram, Number1 and Gramgram, and Number1 and Gramungram by subject. If Gram was a numeric , you could eliminate the random correlation between Gram and Number1 with a model specified by:

m <- lmer(RT ~ Gram*Number + (1|Subject) + (0+Gram|Subject) (0+Number|Subject) + (1|Item),data= data)

You'll notice that any random effect specified in the same set of parentheses yields a random effect correlation. At least that is true for models without the / symbol, I'm not really familiar with the notion for lmer when that is in the mix.

However, given what we saw from a model where you estimated this parameter, I'd advise caution. Moreover, you'll probably note that my code above doesn't work for you.

EDIT

For those of you just now joining our program... for these examples I'll refer to primingHeid as OP did in the comments, this dataset can be found in languageR.

library(languageR)
library(lme4)
data(primingHeid)

Why doesn't my code work? It doesn't work because Gram is a factor. Think about it for a little bit... and look at your fixed effects. If a factor has two levels how many parameters must you estimate to explain its effects? Two. Of course, one of the parameters you estimate is the intercept. The interpretation of the intercept will depend on how your factors are coded. In treatment coding (the default in R), the intercept represents the value for a case when all variables are at level 1 (cf. a regression textbook for details of other contrasts). Regardless of your contrasts two parameters are estimated for two levels of a factor. I think what is happening is that when you fail to specify an intercept R is protecting you from yourself and going ahead and estimating two parameters anyway. Try summary(lm(RT ~ 0 + Condition,data=primingHeid)) and you'll see that it went right on ahead and estimated two parameters. So, back to the context of lmer... if you have a factor with two levels, R will gladly estimate two parameters and then correlate them all under the hood. Back again to your comments... estimate lmer(RT ~ Condition +(0+Condition|Subject),data=primingHeid) and look and the ranef of that model and you'll see yet again that this is exactly what R has done.

If you wanted to force R to stop that, you'd have to do the factor coding manually by turning Condition into a numeric. The assumptions you'd have to make about the mean value of RT when Condition was at the level you coded as 0 are likely untenable (i.e. that RT is really 0). I won't exclude the possibility that with some careful thought, transformation of the DV (centering on the mean of condition you are setting equal to 0?), and good model specification you might work your way somewhere that made some sense... but, that would be an entirely different question and I can't speak to it at the moment.

\EDIT

I think you probably should step back and think about your model structure a little more (which really is one of the great messages in Barr et al., 2013). Are items crossed with gram and number? How many items occur within a unique arrangement of gram and number per subject?

More general issues now...

I have a huge amount of respect for Barr (no surprises there). However, he is not entirely mainstream on issues related to fitting this type of model. That isn't a bad thing ... but time will tell whether his approach for these models will become the next big thing. I have little doubt that 'keeping it maximal' is great if your data will tolerate it. But sometimes it won't. The backwards selection procedure he published involving the use of non-converged models is a bit unexpected. However, I have to admit now that I've seen his appendices I'm a little less sour on the idea than I was when I first read it. All the same, I'd like to see it be vetted a bit more.

You'll note Barr specifically does not use pvals.fnc() for models that have random correlations. So, only having skimmed the published version of his paper, I'd guess you can only use it under his approach if you can backwards step to a point where you don't have any.

Going now to my training with other stats gurus I feel compelled to say that almost all of this worry is an exercise in p-value fetishism that may be entirely misplaced - especially if you consider that this level of nested decision making yields a test that has a definition that is difficult to define.

Best Answer

Related Solutions

R Mixed Models – Using pvals.fnc() for p-values in lmer() while Handling Random Factor Correlations

Related Question