Solved – How to fit a nonlinear mixed effects model for repeated measures data using nlmer()

lme4-nlmemixed modelrrepeated measures

I am trying to analyze repeated measures data and am struggling to make it work in R. My data is essentially the following, I have two treatment groups. Every subject in each group is tested everyday and given a score (the percentage correct on a test). The data is in the long format:

Time Percent Subject   Group
   1       0    GK11 Ethanol
   2       0    GK11 Ethanol
   3       0    GK11 Ethanol
   4       0    GK11 Ethanol
   5       0    GK11 Ethanol
   6       0    GK11 Ethanol

The data resembles a logistic curve, subjects do very poorly for a few days followed by rapid improvement, followed by a plateau. I'd like to know if the treatment has an effect on the test performance curve. My thought was to use nlmer() in the lme4 package in R. I can fit lines for each group using the following:

print(nm1 <- nlmer(Percent ~ SSlogis(Time,Asym, xmid, scal) ~ Asym | Subject,
salinedata, start = c(Asym =.60,  xmid = 23, scal = 5)), corr = FALSE)

I can than compare groups by looking at the estimates for the different parameters and standard deviations of the estimated lines but I'm not sure this is the proper way to do it. Any help would be greatly appreciated.

Best Answer

You can use normal likelihood ratio tests. Here’s a simple example. First, let’s create observations from 10 individuals based on your parameters:

Asym = .6
xmid = 23
scal = 5

n = 10
time = seq(1,60,5)

d = data.frame(time=rep(time,10),
               Asym, xmid, scal, group=0)
d$subj = factor(rep(1:n, each=length(time)))

Now let half of them have different asymptotes and midpoint parameters:

ind = (nrow(d)/2):nrow(d)
d$Asym[ind] = d$Asym[ind] + .1
d$xmid[ind] = d$xmid[ind] + 10
d$group[ind] = 1
d$group=factor(d$group)

We can simulate response values for all the individuals, based on the model:

set.seed(1)
d = transform(d, y = Asym/(1+exp((xmid-time)/scal)) +
                     rnorm(nrow(d), sd=.04))
library(lattice)
xyplot(y~time | group, group=subj,
       data=d, type=c("g","l"), col="black")

Spaghetti plots of the data

We can see clear differences between the two groups, differences that the models should be able to pick up. Now let’s first try to fit a simple model, ignoring groups:

> fm1 = nls(y ~ SSlogis(time, Asym, xmid, scal), data=d)
> coef(fm1)
      Asym       xmid       scal 
 0.6633042 28.5219166  5.8286082

Perhaps as expected, the estimates for Asym and xmid are somewhere between the real parameter values for the two groups. (That this would be the case isn’t obvious, though, since the scale parameter is also changed, to adjust for the model misspecification.) Now let’s fit a full model, with different parameters for the two groups:

> fm2 = nls(y ~ SSlogis(time, Asym[group], xmid[group], scal[group]),
          data=d,
          start=list(Asym=rep(.6,2), xmid=rep(23,2), scal=rep(5,2)))
> coef(fm2)
    Asym1     Asym2     xmid1     xmid2     scal1     scal2 
 0.602768  0.714199 22.769315 33.331976  4.629332  4.749555

Since the two models are nested, we can do a likelihood ratio test:

> anova(fm1, fm2)
Analysis of Variance Table

Model 1: y ~ SSlogis(time, Asym, xmid, scal)
Model 2: y ~ SSlogis(time, Asym[group], xmid[group], scal[group])
  Res.Df Res.Sum Sq Df  Sum Sq F value    Pr(>F)    
1    117    0.70968                                 
2    114    0.13934  3 0.57034  155.54 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The extremely small p-value clearly shows that the simple model was too simple; the two groups do differ in their parameters.

However, the two scale parameters estimates are almost identical, with a difference of just .1. Perhaps we need only need one scale parameter? (Of course we know the answer is yes, since we have simulated data.)

(The difference between the two asymptote parameters is also just .1, but that’s a large difference when we take the standard errors into account – see summary(fm2).)

So we fit a new model, with a common scale parameter for the two groups, but different Asym and xmid parameters, as before:

> fm3 = nls(y ~ SSlogis(time, Asym[group], xmid[group], scal),
          data=d,
          start=list(Asym=rep(.6,2), xmid=rep(23,2), scal=5))
> coef(fm3)
     Asym1      Asym2      xmid1      xmid2       scal 
 0.6035251  0.7129002 22.7821155 33.3080264  4.6928316

And since the reduced model is nested in the full model, we can again do a likelihood ratio test:

> anova(fm3, fm2)
Analysis of Variance Table

Model 1: y ~ SSlogis(time, Asym[group], xmid[group], scal)
Model 2: y ~ SSlogis(time, Asym[group], xmid[group], scal[group])
  Res.Df Res.Sum Sq Df     Sum Sq F value Pr(>F)
1    115    0.13945                             
2    114    0.13934  1 0.00010637   0.087 0.7685

The large p-value indicates that the reduced model fits as well as the full model, as expected.

We can of course do similar tests to check if different parameter values are needed for just Asym, just xmid or both. That said, I would not recommend doing stepwise regression like this to eliminate parameters. Instead, just test the full model (fm2) against the simple model (fm1), and be happy with the results. To quantify any differences, plots will be helpful.

Related Solutions

R – Using lmer for Repeated-Measures Linear Mixed-Effect Model

I think that your approach is correct. Model m1 specifies a separate intercept for each subject. Model m2 adds a separate slope for each subject. Your slope is across days as subjects only participate in one treatment group. If you write model m2 as follows it's more obvious that you model a separate intercept and slope for each subject

m2 <- lmer(Obs ~ Treatment * Day + (1+Day|Subject), mydata)

This is equivalent to:

m2 <- lmer(Obs ~ Treatment + Day + Treatment:Day + (1+Day|Subject), mydata)

I.e. the main effects of treatment, day and the interaction between the two.

I think that you don't need to worry about nesting as long as you don't repeat subject ID's within treatment groups. Which model is correct, really depends on your research question. Is there reason to believe that subjects' slopes vary in addition to the treatment effect? You could run both models and compare them with anova(m1,m2) to see if the data supports either one.

I'm not sure what you want to express with model m3? The nesting syntax uses a /, e.g. (1|group/subgroup).

I don't think that you need to worry about autocorrelation with such a small number of time points.

Solved – Mixed-effect model in R using lme for data count data with two fixed effects and repeated measures

I will quickly address the general use of aov. When using aov in R, type I sum of squares are used. These are sequential, which means the order of variables will affect the results if the design is unbalanced (see here: http://goanna.cs.rmit.edu.au/~fscholer/anova.php). Type III sum of squares are sometimes preferred when there is an interaction and type II when there is not a significant interaction. This can be done in the car package with the function Anova (notice the capital A). This may be why your anova results did not make sense.

Now to address the question about mixed effect models. I would first recommend lme4, as I think the formula specification is easier to understand. For instance, the random effect would be + (1|animal/time/treatment). In regards to the degrees of freedom, it is not necessarily the case that your model is wrong. Douglas Bates, the author of lme4, has wrote extensively about the difficulties in calculating degrees of freedom in mixed models (https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html). This has also been discussed on this site (getting degrees of freedom from lmer). Because of this, the lme4 package does not provide p-values and, in order to calculate a p-value, extra steps are necessary such as sampling from the posterior. I am not sure if nlme is still being maintained, but it wouldn't hurt to email the authors.

In the event that the model is right, the tricky part will be interpreting the estimates (Interpreting the regression output from a mixed model when interactions between categorical variables are included). The reference category (i.e, the intercept) is going to be the first level of each factor. From what you have provided, this would be the first time point (I assume time is categorical because random effects are always factors), treatment = CON, and genotype = M. The p-value that is significant, for instance, is comparing time to this reference category. The question is whether this is a meaningful comparison? Using a package for Bayesian multilevel models, for instance brms or rstanarm (http://www.r-bloggers.com/r-users-will-now-inevitably-become-bayesians/), you could add posterior estimates together and use simple subtraction to obtain contrasts at each level of the factors.

This might not have been much help towards your initial question, but specification of random effects will generally change the estimate little unless there is great variation between levels of a random effect. Additionally random effects are not always straight forward (Minimum number of levels for a random effects factor?) or easy to define (What is the difference between fixed effect, random effect and mixed effect models?). If you still cannot get an answer to your question about random effects, you can try a sensitivity analysis. For instance, animal ID should be included as a random effect but the others are open to debate. You could check whether the estimates (eg, coefficients and confidence intervals) change drastically by only nesting some of the variables. If they do not, this would provide confidence in your model and you could mention the potential problem with the random effects in the discussion of your paper. For a more rigorous approach, you could use a likelihood ratio test comparing models that differ in regards to random effects (Likelihood ratio tests on linear mixed effect models). You can even use this test to determine whether time is significant. For instance, compare models that differ only in the inclusion of time.

Another option would be to use a gee, generalized estimating equation (r packages: gee & geepack), which might be appropriate here because the correlations between outcomes do not need to be correctly specified. The method is robust to "unknown" correlations. This is also ideal when samples are small (see here: http://epm.sagepub.com/content/76/1/64.short; https://en.wikipedia.org/wiki/Generalized_estimating_equation).

In regards to using different distributions, you could try glmer in the lme4 package with a negative binomial or Poisson distribution. The assumptions of a Poisson distribution are often violated (variance and mean must be close to equal). When there is over dispersion (variance is larger than the mean), the negative binomial distribution is preferred. Since you have 20 potential yes/no's, you should include the number of times possible as an offset which would model the counts as rates.

I hope this information can be of use for the manuscript!

Best Answer

Related Solutions

R – Using lmer for Repeated-Measures Linear Mixed-Effect Model

Solved – Mixed-effect model in R using lme for data count data with two fixed effects and repeated measures

Related Question