Solved – Mixed-effect model in R using lme for data count data with two fixed effects and repeated measures

lme4-nlmemixed modelrrepeated measures

I have no idea how to analyze this dataset.

I am asking if two genotypes, T and M, respond differently to a treatment, E2 (I also have a control, CON). All 36 animals were given both E2 and CON in a counterbalanced order and then their behavior was measured. The behavior was scored once every 30 seconds for 10 minutes as "yes" or "no" (coded "1" or "0").

I am interested in knowing if the treatment affects the genotypes differently and whether this effect changes over time.

I have tried running the ANOVA (see below) as a non-parametric test on my count data, but the data are not normally distributed and the results do not make sense.

model<-aov(behavior~genotype*treatment*time+Error(animal/(time*treatment)), data=dataset)

Therefore, I think that a mixed effects model is right. In this analysis, I think my fixed effects should be genotype and treatment. The random effects should be animal, treatment, and time. I also noted sex and age, but I'm not sure that matters for this analysis. The datafile is set up such that each line is a separate observation for each animal, every 30 seconds. See below:

>animal genotype sex age    treatment   time    behavior                                                                                
>1403   T   F   AHY CON 0   1                                                                               
>1404   T   F   AHY CON 0   1                                                                               
>1406   T   F   HY  CON 0   1                                                                               
>1407   T   F   AHY CON 0   1                                                                               
>1423   T   F   AHY CON 0   1                                                                               
>1425   T   F   HY  CON 0   1                                                                               
>1428   T   F   AHY CON 0   1                                                                               
>1431   T   F   AHY CON 0   1

I have tried modeling the data using lme in R but I am not sure that I am nesting the random factors properly because the df for "genotype" is 28, but I only have 2 genotypes (so it should be df=1). This is my model:

mixed.model1 <- lme(fixed=behavior~genotype * treatment * time, 
random= ~ 1|animal/time/treatment, data=dataset)
summary(mixed.model1)

Linear mixed-effects model fit by REML
 Data: dataset 
       AIC      BIC    logLik
  2727.064 2779.666 -1351.532

Random effects:
 Formula: ~1 | animal
        (Intercept)
StdDev:    1.345537

 Formula: ~1 | time %in% animal
        (Intercept)
StdDev:   0.5004338

 Formula: ~1 | treatment %in% time %in% animal
        (Intercept)  Residual
StdDev:    2.030743 0.5704242

Fixed effects: behavior ~ genotype * treatment * time 
                           Value Std.Error  DF   t-value p-value
(Intercept)            2.1647059 0.4852943 296  4.460604  0.0000
genotypeT              0.3737557 0.7372150  28  0.506983  0.6161
treatmentE2           -0.3098039 0.4942422 296 -0.626826  0.5313
time                  -0.1465241 0.0578876 268 -2.531184  0.0119
genotypeT:treatmentE2  0.7969834 0.7508078 296  1.061501  0.2893
genotypeT:time        -0.1541752 0.0879375 268 -1.753236  0.0807
treatmentE2:time       0.0124777 0.0796543 296  0.156648  0.8756
genotypeT:treatmentE2:time -0.0367201 0.1210036 296 -0.303463  0.7618
 Correlation: 
                      (Intr) genoT treatE2 time gnW:E2 genoW: trtE2:
genoT                 -0.658                                          
treatE2               -0.509  0.335                                   
time                  -0.656  0.432  0.610                            
genoT:treatE2          0.335 -0.509 -0.658 -0.401                     
genoT:time             0.432 -0.656 -0.401 -0.658  0.610              
treatE2:time           0.451 -0.297 -0.886 -0.688  0.584  0.453       
genoT:treatE2:time    -0.297  0.451  0.584  0.453 -0.886 -0.688 -0.658

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-0.53883607 -0.11769447 -0.03528952  0.04858721  1.83383169 

Number of Observations: 600
Number of Groups: 
 animal        time %in% animal treat %in% time %in% animal
 30                         300                         600

I am also concerned that the count data are not normally distributed. Should I use a GLM instead? If so, does that change the random effects? Please help. I have not seen any examples like this in any of the R books I have seen or on this blog.

Thanks in advance!!!

Best Answer

I will quickly address the general use of aov. When using aov in R, type I sum of squares are used. These are sequential, which means the order of variables will affect the results if the design is unbalanced (see here: http://goanna.cs.rmit.edu.au/~fscholer/anova.php). Type III sum of squares are sometimes preferred when there is an interaction and type II when there is not a significant interaction. This can be done in the car package with the function Anova (notice the capital A). This may be why your anova results did not make sense.

Now to address the question about mixed effect models. I would first recommend lme4, as I think the formula specification is easier to understand. For instance, the random effect would be + (1|animal/time/treatment). In regards to the degrees of freedom, it is not necessarily the case that your model is wrong. Douglas Bates, the author of lme4, has wrote extensively about the difficulties in calculating degrees of freedom in mixed models (https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html). This has also been discussed on this site (getting degrees of freedom from lmer). Because of this, the lme4 package does not provide p-values and, in order to calculate a p-value, extra steps are necessary such as sampling from the posterior. I am not sure if nlme is still being maintained, but it wouldn't hurt to email the authors.

In the event that the model is right, the tricky part will be interpreting the estimates (Interpreting the regression output from a mixed model when interactions between categorical variables are included). The reference category (i.e, the intercept) is going to be the first level of each factor. From what you have provided, this would be the first time point (I assume time is categorical because random effects are always factors), treatment = CON, and genotype = M. The p-value that is significant, for instance, is comparing time to this reference category. The question is whether this is a meaningful comparison? Using a package for Bayesian multilevel models, for instance brms or rstanarm (http://www.r-bloggers.com/r-users-will-now-inevitably-become-bayesians/), you could add posterior estimates together and use simple subtraction to obtain contrasts at each level of the factors.

This might not have been much help towards your initial question, but specification of random effects will generally change the estimate little unless there is great variation between levels of a random effect. Additionally random effects are not always straight forward (Minimum number of levels for a random effects factor?) or easy to define (What is the difference between fixed effect, random effect and mixed effect models?). If you still cannot get an answer to your question about random effects, you can try a sensitivity analysis. For instance, animal ID should be included as a random effect but the others are open to debate. You could check whether the estimates (eg, coefficients and confidence intervals) change drastically by only nesting some of the variables. If they do not, this would provide confidence in your model and you could mention the potential problem with the random effects in the discussion of your paper. For a more rigorous approach, you could use a likelihood ratio test comparing models that differ in regards to random effects (Likelihood ratio tests on linear mixed effect models). You can even use this test to determine whether time is significant. For instance, compare models that differ only in the inclusion of time.

Another option would be to use a gee, generalized estimating equation (r packages: gee & geepack), which might be appropriate here because the correlations between outcomes do not need to be correctly specified. The method is robust to "unknown" correlations. This is also ideal when samples are small (see here: http://epm.sagepub.com/content/76/1/64.short; https://en.wikipedia.org/wiki/Generalized_estimating_equation).

In regards to using different distributions, you could try glmer in the lme4 package with a negative binomial or Poisson distribution. The assumptions of a Poisson distribution are often violated (variance and mean must be close to equal). When there is over dispersion (variance is larger than the mean), the negative binomial distribution is preferred. Since you have 20 potential yes/no's, you should include the number of times possible as an offset which would model the counts as rates.

I hope this information can be of use for the manuscript!

Related Solutions

Solved – Mixed model with repeated measures and both of two treatments with lme or lmer in R

Many years late, but for completeness' sake, I will attempt to answer this.

As noted in the question, the main complication is that there is variance in the data explainable by the fact that there are multiple measurements on each individual. You can account for this with a random effect. The simplest model that you might use to address this above question (in R) is:

library(lme4)
lmer(distance_estimate ~ condition + measurement_occasion + actual_distance +
         (1|participant))

Things to note:

1) I have assumed no interactions here but several are plausible. For example, condition and actual_distance might interact, as might condition and measurement_occasion. Which ones are included depends on your hypotheses.

2) I've also used just a random intercept here, but you might well include a random slope, or both random intercepts and slopes. Again, this will depend on your hypotheses and domain knowledge.

Solved – How to fit a nonlinear mixed effects model for repeated measures data using nlmer()

You can use normal likelihood ratio tests. Here’s a simple example. First, let’s create observations from 10 individuals based on your parameters:

Asym = .6
xmid = 23
scal = 5

n = 10
time = seq(1,60,5)

d = data.frame(time=rep(time,10),
               Asym, xmid, scal, group=0)
d$subj = factor(rep(1:n, each=length(time)))

Now let half of them have different asymptotes and midpoint parameters:

ind = (nrow(d)/2):nrow(d)
d$Asym[ind] = d$Asym[ind] + .1
d$xmid[ind] = d$xmid[ind] + 10
d$group[ind] = 1
d$group=factor(d$group)

We can simulate response values for all the individuals, based on the model:

set.seed(1)
d = transform(d, y = Asym/(1+exp((xmid-time)/scal)) +
                     rnorm(nrow(d), sd=.04))
library(lattice)
xyplot(y~time | group, group=subj,
       data=d, type=c("g","l"), col="black")

Spaghetti plots of the data

We can see clear differences between the two groups, differences that the models should be able to pick up. Now let’s first try to fit a simple model, ignoring groups:

> fm1 = nls(y ~ SSlogis(time, Asym, xmid, scal), data=d)
> coef(fm1)
      Asym       xmid       scal 
 0.6633042 28.5219166  5.8286082

Perhaps as expected, the estimates for Asym and xmid are somewhere between the real parameter values for the two groups. (That this would be the case isn’t obvious, though, since the scale parameter is also changed, to adjust for the model misspecification.) Now let’s fit a full model, with different parameters for the two groups:

> fm2 = nls(y ~ SSlogis(time, Asym[group], xmid[group], scal[group]),
          data=d,
          start=list(Asym=rep(.6,2), xmid=rep(23,2), scal=rep(5,2)))
> coef(fm2)
    Asym1     Asym2     xmid1     xmid2     scal1     scal2 
 0.602768  0.714199 22.769315 33.331976  4.629332  4.749555

Since the two models are nested, we can do a likelihood ratio test:

> anova(fm1, fm2)
Analysis of Variance Table

Model 1: y ~ SSlogis(time, Asym, xmid, scal)
Model 2: y ~ SSlogis(time, Asym[group], xmid[group], scal[group])
  Res.Df Res.Sum Sq Df  Sum Sq F value    Pr(>F)    
1    117    0.70968                                 
2    114    0.13934  3 0.57034  155.54 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The extremely small p-value clearly shows that the simple model was too simple; the two groups do differ in their parameters.

However, the two scale parameters estimates are almost identical, with a difference of just .1. Perhaps we need only need one scale parameter? (Of course we know the answer is yes, since we have simulated data.)

(The difference between the two asymptote parameters is also just .1, but that’s a large difference when we take the standard errors into account – see summary(fm2).)

So we fit a new model, with a common scale parameter for the two groups, but different Asym and xmid parameters, as before:

> fm3 = nls(y ~ SSlogis(time, Asym[group], xmid[group], scal),
          data=d,
          start=list(Asym=rep(.6,2), xmid=rep(23,2), scal=5))
> coef(fm3)
     Asym1      Asym2      xmid1      xmid2       scal 
 0.6035251  0.7129002 22.7821155 33.3080264  4.6928316

And since the reduced model is nested in the full model, we can again do a likelihood ratio test:

> anova(fm3, fm2)
Analysis of Variance Table

Model 1: y ~ SSlogis(time, Asym[group], xmid[group], scal)
Model 2: y ~ SSlogis(time, Asym[group], xmid[group], scal[group])
  Res.Df Res.Sum Sq Df     Sum Sq F value Pr(>F)
1    115    0.13945                             
2    114    0.13934  1 0.00010637   0.087 0.7685

The large p-value indicates that the reduced model fits as well as the full model, as expected.

We can of course do similar tests to check if different parameter values are needed for just Asym, just xmid or both. That said, I would not recommend doing stepwise regression like this to eliminate parameters. Instead, just test the full model (fm2) against the simple model (fm1), and be happy with the results. To quantify any differences, plots will be helpful.

Best Answer

Related Solutions

Solved – Mixed model with repeated measures and both of two treatments with lme or lmer in R

Solved – How to fit a nonlinear mixed effects model for repeated measures data using nlmer()

Related Question