Solved – How to determine Degrees of Freedom in Linear (Mixed Effect) Regression

degrees of freedomlinear modellme4-nlmemixed model

In a statistics class we had a final homework to work with the data set lexdec of languageR (full script at the end). The task was to check if several effects influence subject's reading times (RT). We started with a simple model ("Is the RT affected by the number of trials?") and extended it ("Has NativeLanguage any impact", etc.), until we ended up with this:

model4 = lmer(RT ~ Trial + NativeLanguage + (1|Word) + (1 + Trial|Subject), lexdec)

This was correct (and since this is not an actual analysis but a toy example, we can for now ignore the warnings about the model not converging…). My team partner and I, however, wanted to know more: How to read and interpret the output to check if the effect we try to predict with the model is significant?

A method we learned is the t test. In the simplest model, which is without mixed effects, we get the results for free:

summary(lm(RT ~ Trial, lexdec))

yields the interpretable

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.4172041  0.0144549 443.948   <2e-16 ***
Trial       -0.0003060  0.0001256  -2.435    0.015 *  

Sweet! In case we fixed our alpha value to the standard .05 (or .025, as it's a two-sided test), the Trial influence is actually significant.

But lmer does not provide this, instead we only get the t values reported. So here's our script to check if the values are significant:

ttest <- function (model, effect, df, alpha=0.05) {
    # Performs a two-sided t-test for a given alpha value on the given model.
    t <- abs(coef(summary(model))[effect, 't value'])
    q <- qt(1 - alpha / 2, df=df)
    paste('|t| =', round(t, 3), '>', round(q, 3), '?', t > q)
}

Now the tricky part: How to come up with the degrees of freedom (df)? I always understood df as the number of variables I can permutate independently (sometimes constraining others, e.g. robotic joints).

Very often I get thrown at with definitions like: $N-k-1$ (see e.g. here Multiple linear regression degrees of freedom), or of course the df formulae for Student's and Welch's t tests. All of them result in different numbers of df, but against infinitely many they wouldn't matter much anyway (the wikipedia t value table nicely shows that even for 100+ it's almost unimportant for "our" small behavioral experiments).

Said that, it seems to be a plausible assumption that the authors of such libraries already solved that problem, and yes, in the summary (of the lm!) there's also a number of dfs: F-statistic: 5.931 on 1 and 1657 DF, p-value: 0.01498. Wait, 1657? This seems like a lot. But it's the numbers of samples ($N=1659$) minus two, thus: $1659-k-1$ with $k=1$, which must be the only variable, Trial.

So I assume for the library the number of samples is the best guess. But: If I cleanly write my model before I would test my subjects, I would not know with how many df I would end up with (will I have 30 subjects? Or only 29 because one is ill?), as that number depends on $N$. So a variable must be something in my model, and of course I can find someone who agrees with me (Degrees of Freedom Calculation in Linear Mixed Model):

In the […] model you have a parameter for the intercept, a variance parameter for each random effect term and a variance parameter for the observations given the random effects

If I count those I come up with seven: Intercept, Trial-Variance, NativeLanguage-Variance, Word-Intercept-Variance, Subject-Intercept-Variance, Subject-Trial-Variance, Residual.

This looks much cleaner, but unfortunately it changes a lot of our significant results, even for the simple model. While the built-in lm tells us: Pr(>|t|) = 0.015, we now find: |t| = 2.435 > 4.303 ? FALSE, because the df are too small now.

Additionally for the complex model we get multiple t values. Which ones should we consider? Is something significant iff all are significant? One? I heard a lot about corrections for multiple testing – is it already needed if I test multiple parameters?

Can someone shed some light into this significance and df darkness I fell into? Is my approach even correct this way? Or is there no simple answer and I always have to take into account many different things as df – and can just assume infinitely many by just using enough samples and participants (I sometimes hear that having too many is also not really good, without explanations, but I assume it's due to overfitting?)?

A similar question is How should I determine degrees of freedom for t-test in ordinal regression?, which is unanswered, and I enjoyed How to get the degree of freedom when doing multiple linear regression?, which unfortunately compares nested models – something in which I am not interested at this time.

Here is a full working example code to see what I am talking about, followed by its output:

library('languageR')
library('lme4')

ttest <- function (model, effect, df, alpha=0.05) {
    # Performs a two-sided t-test for a given alpha value on the given model.
    t = abs(coef(summary(model))[effect, 't value'])
    q = qt(1 - alpha / 2, df=df)
    paste('|t| =', round(t, 3), '>', round(q, 3), '?', t > q)
}

model1 = lm(RT ~ Trial, lexdec)
summary(model1)
print(paste("Is reaction time affected by the number of trials only?", ttest(model1, 'Trial', 2)))

# snip models 2-3

model4 = lmer(RT ~ Trial + (1 + Trial|Subject) + (1|Word) + NativeLanguage, lexdec)
summary(model4)
print(paste("Is reaction time still affected by trials if considering native language and by-subject variance?", ttest(model4, 'Trial', 7)))
print(paste("Is reaction time still affected by native language if considering by-subject variance?", ttest(model4, 'NativeLanguageOther', 7)))

Output:

> library('languageR')
> library('lme4')
> 
> ttest <- function (model, effect, df, alpha=0.05) {
+   # Performs a two-sided t-test for a given alpha value on the given model.
+   t = abs(coef(summary(model))[effect, 't value'])
+   q = qt(1 - alpha / 2, df=df)
+   paste('|t| =', round(t, 3), '>', round(q, 3), '?', t > q)
+ }
> 
> model1 = lm(RT ~ Trial, lexdec)
> summary(model1)

Call:
lm(formula = RT ~ Trial, data = lexdec)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.53961 -0.16943 -0.04366  0.11594  1.18357 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.4172041  0.0144549 443.948   <2e-16 ***
Trial       -0.0003060  0.0001256  -2.435    0.015 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2412 on 1657 degrees of freedom
Multiple R-squared:  0.003567,  Adjusted R-squared:  0.002965 
F-statistic: 5.931 on 1 and 1657 DF,  p-value: 0.01498

> print(paste("Is reaction time affected by the number of trials only?", ttest(model1, 'Trial', 2)))
[1] "Is reaction time affected by the number of trials only? |t| = 2.435 > 4.303 ? FALSE"
> 
> # snip models 2-3
> 
> model4 = lmer(RT ~ Trial + (1 + Trial|Subject) + (1|Word) +NativeLanguage, lexdec)
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00759802 (tol = 0.002, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
> summary(model4)
Linear mixed model fit by REML ['lmerMod']
Formula: RT ~ Trial + (1 + Trial | Subject) + (1 | Word) + NativeLanguage
   Data: lexdec

REML criterion at convergence: -922.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.3256 -0.6159 -0.0954  0.4647  6.0361 

Random effects:
 Groups   Name        Variance  Std.Dev.  Corr 
 Word     (Intercept) 5.942e-03 0.0770873      
 Subject  (Intercept) 3.179e-02 0.1783087      
          Trial       4.937e-07 0.0007026 -0.72
 Residual             2.869e-02 0.1693814      
Number of obs: 1659, groups:  Word, 79; Subject, 21

Fixed effects:
                      Estimate Std. Error t value
(Intercept)          6.3403360  0.0477379  132.82
Trial               -0.0002435  0.0001778   -1.37
NativeLanguageOther  0.1640987  0.0563356    2.91

Correlation of Fixed Effects:
            (Intr) Trial 
Trial       -0.609       
NtvLnggOthr -0.506  0.001
convergence code: 0
Model failed to converge with max|grad| = 0.00759802 (tol = 0.002, component 1)
Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?

> print(paste("Is reaction time still affected by trials if considering native language and by-subject variance?", ttest(model4, 'Trial', 7)))
[1] "Is reaction time still affected by trials if considering native language and by-subject variance? |t| = 1.369 > 2.365 ? FALSE"
> print(paste("Is reaction time still affected by native language if considering by-subject variance?", ttest(model4, 'NativeLanguageOther', 7)))
[1] "Is reaction time still affected by native language if considering by-subject variance? |t| = 2.913 > 2.365 ? TRUE"

Best Answer

There is an extensive discussion of this topic under the heading Why doesn’t lme4 display denominator degrees of freedom/p values? What other options do I have? on the GLMM FAQ GitHub page.

Particularly, because of the crossed random grouping factors (Word and Subject) in your linear mixed model, the following applies:

When the data are not classical (crossed, unbalanced, R-side effects), we might still guess that the deviances etc. are approximately F-distributed but that we don’t know the real degrees of freedom – this is what the Satterthwaite, Kenward-Roger, Fai-Cornelius, etc. approximations are supposed to do.

We can use the package lmerTest to compute the Satterthwaite and Kenward-Roger approximations for the degrees of freedom:

library("lmerTest")

model4 <- lmer(RT ~ Trial + (1 + Trial | Subject) + (1 | Word) + NativeLanguage, lexdec)

summary(model4, ddf = "Satterthwaite")
# [...]
# Fixed effects:
#                       Estimate Std. Error         df t value Pr(>|t|)    
# (Intercept)          6.3403355  0.0477400 26.0766153 132.810  < 2e-16 ***
# Trial               -0.0002435  0.0001778 19.7606718  -1.369  0.18627    
# NativeLanguageOther  0.1640999  0.0563369 19.0229754   2.913  0.00892 ** 
# [...]

summary(model4, ddf = "Kenward-Roger")
# [...]
# Fixed effects:
#                       Estimate Std. Error         df t value Pr(>|t|)    
# (Intercept)          6.3403355  0.0483794 25.7103491 131.054   <2e-16 ***
# Trial               -0.0002435  0.0001778 19.8844304  -1.369   0.1862    
# NativeLanguageOther  0.1640999  0.0592290 19.0094714   2.771   0.0122 * 
# [...]