Solved – How to define a mixed model in R. Repeated measures or time-series ? Which effect should be random

mixed modelrandom-effects-modelrepeated measures

Hello to the CrossValidated community. I could use some help.

I am struggling with the analysis of an experiment, specifically with deciding and defining in code the structure of a mixed model for analysing it.

The experimental design is as follows:

We wanted to see if there are any differences on the growth of lesions caused by different combinations of pathogens (Treatment in the data).

To investigate this we used detached leaves and inoculated them with said Treatments. Each leaf was inoculated in two spots which were averaged to give us the Area of each lesion. Measurements were done per day for all leaves in the experiment, for days 4-9 after the inoculation (Day in the data below).

Two difference cultivars of potato plants were used in the experiment (Cultivar in the data).

Leaves were arranged in boxes (Box in the data) for a complete randomized block design. Each box contained 7 leaves of each cultivar, each leaf inoculated with one of the Treatment, so a total of 14 leaves in each box.

LeafID is a unique identifier for each of leaves in the experiment.

My understanding is that Box, Cultivar, Treatment and Day are crossed factors. LeafID is crossed with Day and Box and nested within Treatment and Cultivar. I am mentioning this to check my understanding of the experimental design and furthermore to understand if it's important for defining the model. I seem to remember reading that lmer accounts for nested or crossed designs internally but can't quote a source unfortunately.

Here is a pick at my data structure in R, Day and Area are coded as double, all other variables are coded as factors

Box Cultivar Treatment  Day  LeafID Area
1   Arsenal  Control    4    121    0.0000000
1   Arsenal  Control    5    121    0.0000000
1   Arsenal  Control    6    121    0.0000000
1   Arsenal  Control    7    121    0.0000000
1   Arsenal  Control    8    121    0.0000000
1   Arsenal  Control    9    121    0.0000000
1   Arsenal  A.solani   4    21     0.1745344
1   Arsenal  A.solani   5    21     0.3062256
1   Arsenal  A.solani   6    21     0.3721086
1   Arsenal  A.solani   7    21     0.4217482
....

My understanding is that this can be analysed as either a repeated measures or a time-series design, and the choice really depends on the research question. I chose to look at this as a repeated measures design, mostly because it's easier for me to grasp the concepts and statistics involved. I am more than happy to hear differing opinions of course!

I tried to fit my model the same way Crawley does in the R Book (p,695, "Mixed-effects models with temporal pseudoreplication")

Initially we came up with the following model:

m0.full <- lme(data=DLA2, Area ~ Treatment * Cultivar * Day  ,
random = ~Day | LeafID, na.action = na.exclude ,method = "ML")

The difference from what Crawley does is that our model has extra fixed effects and their interactions Treatment * Cultivar * Day and we are using Day and its interaction as we care about the progress over time and how this affects our growth.

What I am struggling with is deciding if Cultivar should be a fixed or random effect and if including Day in the fixed effects also gives the wrong interpretation.

I've spent countless hours going through mixed model tutorials and I know that going for a very complex model is not the way to go. I guess the most confusing part is deciding what to have as fixed and random. I've seen some people suggesting you can use a factor in both fixed and random parts of the model but i have trouble grasping the concept.

I would be happy to share the entire dataset if that helps, any suggestions of the best way to do this would be welcome.

I am looking forward for some suggestions on how to handle my analysis and what could be improved.

I would be happy using either lme or lmer.

Thank you all for your time!

EDIT: I added a plot of my data for each different Treatment*Cultivar combination. Each panel shows the full replicates corresponding to the combination

UPDATE

I came up with these two models, mod.rep takes into account the repeated measures of the Day variable, while mod.time uses a time-series approach as used by Crawley(The R Book). I also accounted for heteroscedasticity using the varIdent function.

Are those models correct ?

mod.rep <- lme(data=DLA2, Radius ~ Treatment * Cultivar, random = ~ 
Day|LeafID, na.action = na.omit ,method = "ML", weights = 
varIdent(form=~1|Treatment))

mod.time <- lme(data=DLA2, Radius ~  Day * Treatment * Cultivar, random = 
~1|LeafID, na.action = na.exclude ,method = "ML", weights = 
varIdent(form=~1|Treatment))

I still have some trouble translating the random effects formulas into words…

Best Answer

In section 1.5 of Pinheiro and Bates (2000) Mixed Effects Models in S and S-Plus, you can find the reference for analyzing nested factors with the nlme package, which is related to lmer. The syntax for defining the nested factors can also be found in the same passage. The book is written about S, but these functions mostly work in R without problems.

For example to introduce a free intercept for a nested factor you could write random = ~1|Box/LeafID (This is just an example, I don't say that this is relevant to the specific experiment.) You can then gradually introduce slopes in the random effects, like update(your_model, random = ~Treatment|Box/LeafID), so that to get random effects for the slope of Treatment as well, and compare model fit. (You can similarly build up your way to the triple interaction term in the fixed effects.)

Related Solutions

Solved – Repeated measures, mixed model, ANOVA or…

Don't forget the SUBJECT random effect. That's the one that makes it into a repeated measures/mixed model design. You are correct that TYPE and TREATMENT are fixed, but how to treat TIME will depend on what assumptions you want to make about time. The simplest thing to do would just be to leave it out and treat the five measures as independent subsamples on each individual, but more generally, you could treat TIME as a fixed effect/repeated measure and model the correlation between each time point.

The preferred terminology in these models can differ depending on your field; as the same model can often be described several ways, so it's not really a matter of deciding whether it's a repeated measure/mixed model/ANOVA; you could probably use any of those terms to describe the model you end up with. What's more important is to define what terms you want to include in the model and how you want them to be able to vary.

Solved – Using lmer for repeated-measures linear mixed-effect model

I think that your approach is correct. Model m1 specifies a separate intercept for each subject. Model m2 adds a separate slope for each subject. Your slope is across days as subjects only participate in one treatment group. If you write model m2 as follows it's more obvious that you model a separate intercept and slope for each subject

m2 <- lmer(Obs ~ Treatment * Day + (1+Day|Subject), mydata)

This is equivalent to:

m2 <- lmer(Obs ~ Treatment + Day + Treatment:Day + (1+Day|Subject), mydata)

I.e. the main effects of treatment, day and the interaction between the two.

I think that you don't need to worry about nesting as long as you don't repeat subject ID's within treatment groups. Which model is correct, really depends on your research question. Is there reason to believe that subjects' slopes vary in addition to the treatment effect? You could run both models and compare them with anova(m1,m2) to see if the data supports either one.

I'm not sure what you want to express with model m3? The nesting syntax uses a /, e.g. (1|group/subgroup).

I don't think that you need to worry about autocorrelation with such a small number of time points.

Best Answer

Related Solutions

Solved – Repeated measures, mixed model, ANOVA or…

Solved – Using lmer for repeated-measures linear mixed-effect model

Related Question