Solved – How to simulate a random slope model

mixed modelrsimulation

I would like do create a mixed linear model for an unbalanced dataset (different number of events per subject and a few missing values for some time points). I am using R version 3.2.1 (2015-06-18), package: nlme_3.1-120.

Here comes simulated data:

library(nlme)
set.seed(1)
subject    <- factor(rep(c(1, 1, 2, 3, 4, 4, 4, 5, 6, 7, 7, 8, 9, 9, 10, 
                           11, 11, 11, 12, 13), 10))
event      <- factor(rep(1:20, 10))
timepoint  <- rep(1:10, each = 20)
measure    <- rnorm(length(timepoint)) + timepoint*0.3
timepoint  <- factor(timepoint)
measure[sample(1:length(measure), rpois(5,4))] <- NA
data       <- data.frame(subject=subject, event=event, timepoint=timepoint, 
                         measure=measure)
str(data)

The model should predict the variable “measure” over different time points as fixed effect and for subjects and events as random effects.

base      <- lme(measure ~ 1,         data=data, random= ~ 1|subject, 
                 na.action=na.exclude, method="ML")
intercept <- lme(measure ~ timepoint, data=data, random= ~ 1|subject, 
                 na.action=na.exclude, method="ML")
nested    <- lme(measure ~ timepoint, data=data, random= ~ 1|subject/event, 
                 na.action=na.exclude, method="ML")
anova(base, intercept, nested)

I would like to fit random intercept and slope, because intercept and slope can vary among subjects and events. However when I add the random slope effect, the model does not converge. It does not through any error message, but it runs to infinity. What can I do create a model with random slope that converges?

cave model runs endless

slope <- lme(measure ~ timepoint, data=data, random= ~ timepoint|subject, 
             na.action=na.exclude, method="ML")

I tried also this

cave model runs endless

slope2 <- lme(measure ~ timepoint, data=data, random= ~ timepoint|subject, 
              na.action=na.exclude, method="ML", control=list(opt="optim"))

cave some models may run endless

slope3      <- lme(measure ~ timepoint, data=data, random= ~ timepoint|subject/event, 
                   na.action=na.exclude, method="ML", control = list(opt="optim"))
covariance  <- lme(measure ~ timepoint, data=data, random= ~ timepoint|subject, 
                   correlation=corAR1(),na.action = na.exclude, method="ML")
covariance2 <- lme(measure ~ timepoint, data=data, random= ~ timepoint|subject, 
                   correlation=corAR1(0), na.action=na.exclude, method="ML", 
                   control=list(opt="optim"))
covariance3 <- lme(measure ~ timepoint, data=data, random= ~ timepoint|subject, 
                   correlation=corAR1(0), na.action=na.exclude, method="ML", 
                   control=list(maxlter=1000))

Best Answer

@AdamO has done a good job identifying the specific error in your code. Let me address the question more generally. Here is how I simulate a linear mixed effects model:

Mixed effects models assume each unit has random effects drawn from a multivariate normal distribution. (When a model is estimated, it is the variances and covariances of that multivariate normal that are being estimated for the random effects.) I start by specifying this distribution and generating (pseudo-)random values to serve as the random effects. It is often convenient to specify the variances as $1$, so that the covariance is the correlation between slopes and intercepts (which is easier for me to conceptualize).

library(MASS)
ni = 13                                                 # number of subjects
RE = mvrnorm(ni, mu=c(0,0), Sigma=rbind(c(1.0, 0.3),
                                        c(0.3, 1.0) ))
colnames(RE) = c("ints","slopes");  t(round(RE,2))
#         [,1]  [,2]  [,3] [,4]  [,5]  [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
# ints    0.81 -0.52 -0.65 1.30 -0.29 -1.15 0.04 0.05 0.00 -0.29  2.40 -0.05 -0.47
# slopes -1.82  0.81 -0.70 1.28  0.82 -0.18 0.74 1.14 0.93 -0.20  0.04  0.68 -0.53

Next, I would generate my $X$ variables. I can't really follow the logic of your example, so I will use time as my only regressor.

nj   = 10                              # number of timepoints
data = data.frame(ID   = rep(1:ni,   each=nj), 
                  time = rep(1:nj,   times=ni),
                  RE.i = rep(RE[,1], each=nj),
                  RE.s = rep(RE[,2], each=nj),
                  y    = NA                    )
head(data, 14)
#    ID time       RE.i       RE.s  y
# 1   1    1  0.8051709 -1.8152973 NA
# 2   1    2  0.8051709 -1.8152973 NA
# 3   1    3  0.8051709 -1.8152973 NA
# 4   1    4  0.8051709 -1.8152973 NA
# 5   1    5  0.8051709 -1.8152973 NA
# 6   1    6  0.8051709 -1.8152973 NA
# 7   1    7  0.8051709 -1.8152973 NA
# 8   1    8  0.8051709 -1.8152973 NA
# 9   1    9  0.8051709 -1.8152973 NA
# 10  1   10  0.8051709 -1.8152973 NA
# 11  2    1 -0.5174601  0.8135761 NA
# 12  2    2 -0.5174601  0.8135761 NA
# 13  2    3 -0.5174601  0.8135761 NA
# 14  2    4 -0.5174601  0.8135761 NA

Having generated your random effects and your regressors, you can specify the data generating process. Since you want some randomly missed timepoints, there is a level of additional complexity here. (Note that these data are missing completely at random; for more on simulating missing data, see: How to simulate the different types of missing data.)

y       = with(data, (0 + RE.i) + (.3 + RE.s)*time + rnorm(n=ni*nj, mean=0, sd=1))
m       = rbinom(n=ni*nj, size=1, prob=.1)  
y[m==1] = NA
data$y  = y
head(data, 14)
#    ID time       RE.i       RE.s           y
# 1   1    1  0.8051709 -1.8152973  -0.8659219
# 2   1    2  0.8051709 -1.8152973  -3.6961761
# 3   1    3  0.8051709 -1.8152973  -4.2188711
# 4   1    4  0.8051709 -1.8152973  -4.8380769
# 5   1    5  0.8051709 -1.8152973  -5.4126362
# 6   1    6  0.8051709 -1.8152973  -8.3894008
# 7   1    7  0.8051709 -1.8152973          NA
# 8   1    8  0.8051709 -1.8152973 -11.3710128
# 9   1    9  0.8051709 -1.8152973 -14.2095646
# 10  1   10  0.8051709 -1.8152973 -14.7627970
# 11  2    1 -0.5174601  0.8135761   0.2018260
# 12  2    2 -0.5174601  0.8135761          NA
# 13  2    3 -0.5174601  0.8135761   3.9232935
# 14  2    4 -0.5174601  0.8135761          NA

At this point, you can fit your model. I typically use the lme4 package.

library(lme4)
summary(lmer(y~time+(time|ID), data))
# Linear mixed model fit by REML ['lmerMod']
# Formula: y ~ time + (time | ID)
#    Data: data
# 
# REML criterion at convergence: 378.3
# 
# Scaled residuals: 
#      Min       1Q   Median       3Q      Max 
# -2.48530 -0.61824 -0.08551  0.59285  2.70687 
# 
# Random effects:
#   Groups   Name        Variance Std.Dev. Corr 
#   ID       (Intercept) 0.9970   0.9985        
#            time        0.8300   0.9110   -0.05
#   Residual             0.7594   0.8715        
# Number of obs: 112, groups:  ID, 13
# 
# Fixed effects:
#             Estimate Std. Error t value
# (Intercept)  0.03499    0.33247   0.105
# time         0.53454    0.25442   2.101
# 
# Correlation of Fixed Effects:
#      (Intr)
# time -0.100

Related Solutions

Solved – How to specify partially crossed random effect in lme

This is not a direct answer for lme's syntax.

I would argue that while in theory a specific examiner is part of the greater examiner population and it does make sense to have it as a random effect, you have only 2 (and occasionally 3) replicates. It will most probably be more sensible to use it as fixed effect (possibly as an interaction).

Moreover I would not be too fast to jump on ANOVA for model selection. Assuming you do not want to consider issues of cross-validation etc., maybe an information criterion like AIC will be equally easy to apply and probably slightly more coherent.

I think your first model is quite reasonable as it stands. Maybe try a (0+time|subject)+(1|subject) random structure so that you will fit independently the slope and the intercept of the model...

I scavenged this thread from R-sig-mixed (I had also seen your question there about a fortnight ago) and these two older ones from R-help (thread1, thread2) that might be helpful as they contain some expert opinions on the matter.

Solved – Coding of categorical random effects in R: int vs factor

The difference is because you're separating the intercept and the slope in the random effect. That's an odd thing to do; the usual way to fit this model would be

OK ~ multi + (multi | item) + (1 | subject)

with multi being a factor.

What happens is that in the first model you get what you expect; the 0+multi|item term gives one parameter and the 1|item term gives one parameter, but in the second model the 0 + multi | item term results in two parameters, which are simply the estimate for each condition. If you take the 1|item term out of that model you should get a result that is equivalent to both your first model and the one I give above, except for differences in parameterization.

Note also the correlation of exactly one in your second model; this is a clue that you've overparameterized it and that one of those parameters is not necessary.

Best Answer

Related Solutions

Solved – How to specify partially crossed random effect in lme

Solved – Coding of categorical random effects in R: int vs factor

Related Question