Solved – How to determine effect of random factors and slopes and their variance in Mixed Model

mixed modelrrandom-effects-modelvariance

I would like to determine the variance explained by random factors and slopes in a mixed model but am unsure if the analysis I use and my interpretation are correct. Furthermore, comparing models and analysing a mixed model with random slopes seem to give opposite conclusions, therefore I would like to know when to include random slopes? Below I give an overview of the analysis.

I tested three groups of 10 individuals twice in the same task. As I expect individuals to differ in their response across the two tasks, I also include random slopes. The analysis I run is:

lme(behaviour ~ stage * group, random = ~ stage|ID, data=data)

Part of the output I get is the following:

Linear mixed-effects model fit by maximum likelihood
Data: data 
    AIC       BIC   logLik
   -72.07494 -48.50785 46.03747

 Random effects:
  Formula: ~stage | ID
  Structure: General positive-definite, Log-Cholesky parametrization
             StdDev     Corr  
 (Intercept) 0.12646601 (Intr)
 stage2      0.12662159 -0.455
 Residual    0.05714907       

To calculate the variance I extract the SD of ID, slopes, and the residual variance as follows:

SD.ID <- (fm2$sigma * attr(corMatrix(fm2$modelStruct[[1]])[[1]],"stdDev"))[[1]]
SD.slope <- (fm2$sigma * attr(corMatrix(fm2$modelStruct[[1]])[[1]],"stdDev"))[[2]]
SD.residual <- fm2$sigma

And then calculate the percentage of variance explained:

(SD.ID/(SD.ID+SD.slope+SD.residual))*100
(SD.slope/(SD.ID+SD.slope+SD.residual))*100

In this case this seems to suggest: "individual ID and random slopes explained 40.8% and 40.8% repectively of the variance of behaviour".

Although this seems to suggest the random slopes explain a large part of the variance, it seems perhaps a more simple model without slopes is more appropriate:

fm1 <- lme(behaviour ~ stage * group, random = ~ stage|ID, data=data, method="ML")
fm0 <- lme(behaviour ~ stage * group, random = ~ 1|ID, data=data, method="ML")
anova(fm0,fm1)

since I get the following output:

    Model df       AIC       BIC   logLik   Test    L.Ratio p-value
fm0     1  8 -76.00947 -57.15580 46.00473                          
fm1     2 10 -72.07494 -48.50785 46.03747 1 vs 2 0.06547599  0.9678

Which to me seems to suggest the model with the random slope does not significantly better fit the data. This seems contrasting to the 40% of the variance that it seems to explain, as shown with the data above.

Furthermore, if I correlate the coefficients from model fm1, thus the intercept with the slope:

cor.test(fm1$coefficients[[2]][[1]][,1],fm1$coefficients[[2]][[1]][,2]) 

I get the following output:

Pearson's product-moment correlation

data:  fm1$coefficients[[2]][[1]][, 1] and fm1$coefficients[[2]][[1]][, 2]
t = -2.6802, df = 37, p-value = 0.01092
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.6376166 -0.1004855
sample estimates:
       cor 
-0.4032186 

Which thus seems to suggest that individuals with lower initial behaviour scores change more in their behaviour over time. Therefore again I would think running the model with the slopes would make the most sense.

Thus, to repeat my question: how can I determine the variance explained by random factors and slopes in a mixed model and when do I know when to include random slopes?

Best Answer

To determine the structure of random effects, I would suggest to use restricted maximum likelihood (REML, setting method="REML") to obtain the unbiased estimates the variances of random intercept and slope under a general mean model (like the full model stage*group), and then compare different variance models using the likelihood ratio test (anova()). After you determined the structure of random effects, you can return to maximum likelihood (ML).

As to the percentage of variance explained, it seems that you were using the standard deviation. For the random intercept and slope model, the total variance for the outcome is $$\mathrm{var}(\tt{intercept})+\tt{stage}^2*\mathrm{var}(\tt{slope})+2*\tt{stage}*\mathrm{cov}(\tt{intercept},\tt{slope})+\mathrm{var}(\tt{residual}).$$ As stated in your first output, the correlation between the random intercept and slope is negative, therefore it is not proper to calculate the percentage of variance explained solely by random intercept or slope. It is true that the random effects can explain about 84% of the total variance, say when stage=1, but including the random slope does not make a difference (the loglikelihood increased slightly from 46.00473 to 46.03747 in your output).

> (0.12646601^2+0.12662159^2-2*0.455*0.12646601*0.12662159)/(0.12646601^2+0.12662159^2-2*0.455*0.12646601*0.12662159+0.05714907^2)
[1] 0.842378
> 0.05714907^2/(0.12646601^2+0.12662159^2-2*0.455*0.12646601*0.12662159+0.05714907^2)
[1] 0.157622

Edit:

  1. As Jake Westfall pointed out, the variance above is related to the value of the covariate stage, and the proportion of explained variance also vary.
  2. Note that the null hypothesis of the likelihood ratio test is on the boundary of the parameter space, $$H_0: \mathrm{var}(\tt{slope})=0,\mathrm{cov}(\tt{intercept},\tt{slope})=0,$$ thus the test is conservative, i.e., it tends not to reject the null hypothesis.
Related Question