R Mixed Models – Using pvals.fnc() for p-values in lmer() while Handling Random Factor Correlations

hypothesis testinglme4-nlmemixed modelp-valuer

My problem can be summarized very simply: I'm using a linear mixed-effects model and I am trying to get p-values using pvals.fnc(). The problem is that this function seems to have some trouble estimating p-values directly from the t-values associated with model coefficients (Baayen et al., 2008), and I don't know what is going wrong with the way I do it (i.e. according to what I have read, it should work). So, I'm explaining my model below, and if you can point out what I am doing wrong and suggest changes I would really appreciate it!

DESCRIPTION: I have a 2 by 2 within subjects design, fully crossing two categorical factors, "Gram" and "Number", each with two levels. This is the command I used to run the model:

>m <- lmer(RT ~ Gram*Number + (1|Subject) + (0+Gram+Number|Subject) + (1|Item),data= data)

If I understand this code, I am getting coefficients for the two fixed effects (Gram and Number) and their interaction, and I am fitting a model that has by-subject intercepts and slopes for the two fixed effects, and a by-item intercept for them. Following Barr et al. (2013), I thought that this code gets rid of the correlation parameters. I don't want estimate the correlations because I want to get the p-values using pvals.fnc (), and I read that this function doesn't work if there are correlations in the model.

The command seems to work:

>m
Linear mixed model fit by REML 
Formula: RT ~ Gram * Number + (1 | Subject) + (0 + Gram + Number | Subject) + (1 |Item) 
   Data: mverb[mverb$Region == "06v1", ] 
   AIC   BIC logLik deviance REMLdev
 20134 20204 -10054    20138   20108
Random effects:
 Groups      Name        Variance  Std.Dev. Corr          
 Item       (Intercept)   273.508  16.5381               
 Subject     Gramgram        0.000   0.0000               
             Gramungram   3717.213  60.9689    NaN        
             Number1        59.361   7.7046    NaN -1.000 
 Subject     (Intercept) 14075.240 118.6391               
 Residual                35758.311 189.0987               
Number of obs: 1502, groups: Item, 48; Subject, 32

Fixed effects:
             Estimate Std. Error  t value
(Intercept)    402.520     22.321  18.033
Gram1          -57.788     14.545  -3.973
Number1         -4.191      9.858  -0.425
Gram1:Number1   15.693     19.527   0.804

Correlation of Fixed Effects:
            (Intr) Gram1  Numbr1
Gram1       -0.181              
Number1     -0.034  0.104       
Gram1:Nmbr1  0.000 -0.002 -0.011

However, when I try to calculate the p-values I still get an error message:

>pvals.fnc(m, withMCMC=T)$fixed
Error in pvals.fnc(m, withMCMC = T) : 
MCMC sampling is not implemented in recent versions of lme4
  for models with random correlation parameters

Am I making a mistake when I specify my model? Shouldn't pvals.fnc() work if I removed the correlations?

Best Answer

(Italics represent corrected text)

You are making a 'mistake' in your model specification given what you say you want.

Random effects:
 Groups      Name        Variance  Std.Dev. Corr          
 Item       (Intercept)   273.508  16.5381               
 Subject     Gramgram        0.000   0.0000               
             Gramungram   3717.213  60.9689    NaN        
             Number1        59.361   7.7046    NaN -1.000 

You see the numbers there under Corr for Subject? That shows that you are estimating correlations between the random slope of Gramgungram and Gramgram, Number1 and Gramgram, and Number1 and Gramungram by subject. If Gram was a numeric , you could eliminate the random correlation between Gram and Number1 with a model specified by:

m <- lmer(RT ~ Gram*Number + (1|Subject) + (0+Gram|Subject) (0+Number|Subject) + (1|Item),data= data)

You'll notice that any random effect specified in the same set of parentheses yields a random effect correlation. At least that is true for models without the / symbol, I'm not really familiar with the notion for lmer when that is in the mix.

However, given what we saw from a model where you estimated this parameter, I'd advise caution. Moreover, you'll probably note that my code above doesn't work for you.

EDIT

For those of you just now joining our program... for these examples I'll refer to primingHeid as OP did in the comments, this dataset can be found in languageR.

library(languageR)
library(lme4)
data(primingHeid)

Why doesn't my code work? It doesn't work because Gram is a factor. Think about it for a little bit... and look at your fixed effects. If a factor has two levels how many parameters must you estimate to explain its effects? Two. Of course, one of the parameters you estimate is the intercept. The interpretation of the intercept will depend on how your factors are coded. In treatment coding (the default in R), the intercept represents the value for a case when all variables are at level 1 (cf. a regression textbook for details of other contrasts). Regardless of your contrasts two parameters are estimated for two levels of a factor. I think what is happening is that when you fail to specify an intercept R is protecting you from yourself and going ahead and estimating two parameters anyway. Try summary(lm(RT ~ 0 + Condition,data=primingHeid)) and you'll see that it went right on ahead and estimated two parameters. So, back to the context of lmer... if you have a factor with two levels, R will gladly estimate two parameters and then correlate them all under the hood. Back again to your comments... estimate lmer(RT ~ Condition +(0+Condition|Subject),data=primingHeid) and look and the ranef of that model and you'll see yet again that this is exactly what R has done.

If you wanted to force R to stop that, you'd have to do the factor coding manually by turning Condition into a numeric. The assumptions you'd have to make about the mean value of RT when Condition was at the level you coded as 0 are likely untenable (i.e. that RT is really 0). I won't exclude the possibility that with some careful thought, transformation of the DV (centering on the mean of condition you are setting equal to 0?), and good model specification you might work your way somewhere that made some sense... but, that would be an entirely different question and I can't speak to it at the moment.

\EDIT

I think you probably should step back and think about your model structure a little more (which really is one of the great messages in Barr et al., 2013). Are items crossed with gram and number? How many items occur within a unique arrangement of gram and number per subject?

More general issues now...

I have a huge amount of respect for Barr (no surprises there). However, he is not entirely mainstream on issues related to fitting this type of model. That isn't a bad thing ... but time will tell whether his approach for these models will become the next big thing. I have little doubt that 'keeping it maximal' is great if your data will tolerate it. But sometimes it won't. The backwards selection procedure he published involving the use of non-converged models is a bit unexpected. However, I have to admit now that I've seen his appendices I'm a little less sour on the idea than I was when I first read it. All the same, I'd like to see it be vetted a bit more.

You'll note Barr specifically does not use pvals.fnc() for models that have random correlations. So, only having skimmed the published version of his paper, I'd guess you can only use it under his approach if you can backwards step to a point where you don't have any.

Going now to my training with other stats gurus I feel compelled to say that almost all of this worry is an exercise in p-value fetishism that may be entirely misplaced - especially if you consider that this level of nested decision making yields a test that has a definition that is difficult to define.