I would like to determine the variance explained by random factors and slopes in a mixed model but am unsure if the analysis I use and my interpretation are correct. Furthermore, comparing models and analysing a mixed model with random slopes seem to give opposite conclusions, therefore I would like to know when to include random slopes? Below I give an overview of the analysis.
I tested three groups of 10 individuals twice in the same task. As I expect individuals to differ in their response across the two tasks, I also include random slopes. The analysis I run is:
lme(behaviour ~ stage * group, random = ~ stage|ID, data=data)
Part of the output I get is the following:
Linear mixed-effects model fit by maximum likelihood
Data: data
AIC BIC logLik
-72.07494 -48.50785 46.03747
Random effects:
Formula: ~stage | ID
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 0.12646601 (Intr)
stage2 0.12662159 -0.455
Residual 0.05714907
To calculate the variance I extract the SD of ID, slopes, and the residual variance as follows:
SD.ID <- (fm2$sigma * attr(corMatrix(fm2$modelStruct[[1]])[[1]],"stdDev"))[[1]]
SD.slope <- (fm2$sigma * attr(corMatrix(fm2$modelStruct[[1]])[[1]],"stdDev"))[[2]]
SD.residual <- fm2$sigma
And then calculate the percentage of variance explained:
(SD.ID/(SD.ID+SD.slope+SD.residual))*100
(SD.slope/(SD.ID+SD.slope+SD.residual))*100
In this case this seems to suggest: "individual ID and random slopes explained 40.8% and 40.8% repectively of the variance of behaviour".
Although this seems to suggest the random slopes explain a large part of the variance, it seems perhaps a more simple model without slopes is more appropriate:
fm1 <- lme(behaviour ~ stage * group, random = ~ stage|ID, data=data, method="ML")
fm0 <- lme(behaviour ~ stage * group, random = ~ 1|ID, data=data, method="ML")
anova(fm0,fm1)
since I get the following output:
Model df AIC BIC logLik Test L.Ratio p-value
fm0 1 8 -76.00947 -57.15580 46.00473
fm1 2 10 -72.07494 -48.50785 46.03747 1 vs 2 0.06547599 0.9678
Which to me seems to suggest the model with the random slope does not significantly better fit the data. This seems contrasting to the 40% of the variance that it seems to explain, as shown with the data above.
Furthermore, if I correlate the coefficients from model fm1, thus the intercept with the slope:
cor.test(fm1$coefficients[[2]][[1]][,1],fm1$coefficients[[2]][[1]][,2])
I get the following output:
Pearson's product-moment correlation
data: fm1$coefficients[[2]][[1]][, 1] and fm1$coefficients[[2]][[1]][, 2]
t = -2.6802, df = 37, p-value = 0.01092
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.6376166 -0.1004855
sample estimates:
cor
-0.4032186
Which thus seems to suggest that individuals with lower initial behaviour scores change more in their behaviour over time. Therefore again I would think running the model with the slopes would make the most sense.
Thus, to repeat my question: how can I determine the variance explained by random factors and slopes in a mixed model and when do I know when to include random slopes?
Best Answer
To determine the structure of random effects, I would suggest to use restricted maximum likelihood (REML, setting
method="REML"
) to obtain the unbiased estimates the variances of random intercept and slope under a general mean model (like the full modelstage*group
), and then compare different variance models using the likelihood ratio test (anova()
). After you determined the structure of random effects, you can return to maximum likelihood (ML).As to the percentage of variance explained, it seems that you were using the standard deviation. For the random intercept and slope model, the total variance for the outcome is $$\mathrm{var}(\tt{intercept})+\tt{stage}^2*\mathrm{var}(\tt{slope})+2*\tt{stage}*\mathrm{cov}(\tt{intercept},\tt{slope})+\mathrm{var}(\tt{residual}).$$ As stated in your first output, the correlation between the random intercept and slope is negative, therefore it is not proper to calculate the percentage of variance explained solely by random intercept or slope. It is true that the random effects can explain about 84% of the total variance, say when
stage=1
, but including the random slope does not make a difference (the loglikelihood increased slightly from 46.00473 to 46.03747 in your output).Edit:
stage
, and the proportion of explained variance also vary.