Solved – How to build a linear mixed-effects model in R

lme4-nlmemixed modelr

I asked a question but it was a bit long and confusing so I will attempt to keep this shorter and simple (original post https://stats.stackexchange.com/questions/24971/mixed-effects-model-equations)

I am basically looking to see which factors (e.g. A, B, C such as habitat type, site, disturbance rate) most affect the response value (y). I am finally understanding the concept of models and how to compare AIC values to see which combination of factors best explain y (have greater effect on), but I am new to R so wonder if my basic coding is correct.

Firstly I have all my data in a spreadsheet saved as a .csv file, so I read this file into R. I also open the package lme4.

Then I was thinking of using the following code (although letters would be the headings fo each set of values)

m1<-lmer(y ~ A + (1|E), REML=FALSE)
m2<-lmer(y ~ B + (1|E), REML=FALSE)
m3<-lmer(y ~ A + B + (1|E), REML=FALSE)
m4<-lmer(y ~ A + C + (1|E), REML=FALSE)
m5<-lmer(y ~ B + C + (1|E), REML=FALSE)
m6<-lmer(y ~ A + B + C + (1|E), REML=FALSE)
m4<-lmer(y ~ A + D + (1|E), REML=FALSE)
m5<-lmer(y ~ B + D + (1|E), REML=FALSE)
m6<-lmer(y ~ A + B + D + (1|E), REML=FALSE)
m7<-lmer(y ~ C + (1|E), REML=FALSE)
m8<-lmer(y ~ D + (1|E), REML=FALSE)

My basic questions are:

Let us say that C and D are similar factors, and derived from the same data and I do not think it useful to see if they have additive effects, is it okay not to put them in the same model, i.e. mix and match as see fit, or should all combinations be incorporated?
I chose ML because not all sets of values are mixed and matched. Is this correct or should I use REML?
Some sets of values are non-continuous, e.g. brood number or habitat type (either 1 or 2). Should I be letting R know this. Someone mentioned coding each factor but I have no clue how to do this, or is the csv file enough and then perhaps code to let R know about these particular ones?
Lastly, is there also a way to see p-values to see if the factor has a significant effect rather than just being the best fit to explain y?

I hope that this is more simple than my first post and easier to answer all at once. I really wanted to run this model today, but don't want to have to redo it all finding that I made a simple mistake.

Best Answer

It's entirely up to you as to whether you include both factors in the same model or not. But why not try it and see if you get a significantly better fit with both in than with just one in?
REML works with unbalanced and incomplete designs too. I'd go with REML to reduce the bias in the variance estimates and eliminate the bias in the covariance parameters.
x <- as.factor(z) turns z into a factor. You can of course do DF$x <- as.factor(DF$x).
anova(m1, m3) will test for the significance of the terms left out of the larger model. The models have to be nested for this to work.

Edit: The comments have made me realize my answer was way too terse, so I'm adding some sample code.

The code is not doing the full model that you are doing, it's just to illustrate syntax and what happens with ANOVA:

# Construct sample data; E(y) is a function only of x1
x1 <- c("A","A","A","B","B","C","D","D","D","D")
x2 <- c("A","B","C","A","B","C","A","B","C","A")
y <- rnorm(c(0,0,0,1,1,2,3,3,3,3))  # Values for E(y) match w/ x1

# Construct data frame
df <- data.frame(list(y=y, x1=x1, x2=x2))

# Convert x1, x2 to factors
df$x1 <- as.factor(df$x1)
df$x2 <- as.factor(df$x2)

# Run regressions and perform ANOVA to evaluate effect of factor x2
m1 <- lm(y~x1, data=df)
m2 <- lm(y~x1+x2, data=df)

> anova(m1,m2)
Analysis of Variance Table

Model 1: y ~ x1
Model 2: y ~ x1 + x2
  Res.Df     RSS Df Sum of Sq     F Pr(>F)
1      6 12.9004                          
2      4  5.3241  2    7.5763 2.846 0.1703

The "PR(>F)" column gives the p-value associated with the F-test of whether factor x2 is significant.

Related Solutions

Solved – Maximal model for linear mixed-effects model for repeated mesaures design

The maximal structure would need to include also a random effect for the interaction between color and shape, that is:

Y ~ color * shape + (color + shape + color:shape | subject)

This will result in all your predictors (color, shape and their interaction) having a fixed effect (constant for all subjects), and a random effect (individual fluctuations around the estimated fixed effect). In this sense the model is the maximal one. Note that it might not be fully equivalent to a repeated-measures ANOVA as it doesn't make equally strict assumptions on the correlational structure (see Tom's answer).

If you don't include the interaction in the random effect part of the formula, individual variation in the interaction effect will not be considered as "random", and the model will not be equivalent to a repeated-measures ANOVA. Of course, the variance of the random deviates for the interaction (or any other random effect) might be so small that including it in the model do not improve much the fit. You can check this not only with the AIC, but with a likelihood ratio test, as model with vs without one random effect are nested one another. In principle if the likelihood ratio test is not significant, it means that you can safely remove that random effect. Simplifying the random effect structures by removing negligible components would be an example of what in the article you linked is called data-driven approach.

You can simplify the model in this way, and it would still be equivalent to a repeated-measures ANOVA:

Y ~ color*shape + (1|subject) + (0+color|subject) + (0+shape|subject) + (0+color:shape|subject)

This syntax tells lmer to not estimate the correlations of random deviates across subjects. The drawback here is that, for example, you won't be able to tell whether subjects that have a large effect of color tend to have also a larger effect of shape (or smaller effect, in case of negative correlation).

You can easily include a between-subjects predictor, the only difference is that you can't add a random effect for it. "gender" for example cannot have a random effect grouped according to subject, but it can interact with the other fixed effects, e.g.:

Y ~ color * shape * gender + (color + shape + color:shape | subject)

R – How to Obtain the Variance of Fixed Effects in R

I second @IsabellaGhement's suggestion that you strongly consider a binomial model for the incidence (you'll need to know the 'denominator' — the total number of individuals used to compute the incidence).

$R^2$ measures do exist for linear mixed models, although there are several different, all slightly different,definitions. A reasonable place to start would be the overview of the r2glmm R package.

library(r2glmm)
library(nlme)
library(ggplot2)

m1 <- lme(Incidence ~ Habitat + Season, random = ~1|Site, data=dd)
ggplot(dd, aes(Season,Incidence,colour=Habitat))+
    stat_sum(alpha=0.4,position=position_dodge(width=0.2)) +
    scale_size(breaks=1:3,range=c(3,6))+
    geom_line(aes(group=Site),colour="gray")+
    scale_y_sqrt()+
    geom_hline(yintercept=0,lty=2)

Now compute $R^2$ and display:

r2 <- r2beta(model=m1,partial=TRUE,method='sgv')
print(r2)

            Effect   Rsq upper.CL lower.CL
1            Model 0.867    0.929    0.792
2      HabitatEdge 0.705    0.836    0.539
4 HabitatWasteland 0.639    0.797    0.443
3   HabitatOakwood 0.603    0.775    0.394
6     SeasonSummer 0.084    0.348    0.000
5     SeasonSpring 0.003    0.184    0.000
plot(x=r2)

Best Answer

Related Solutions

Solved – Maximal model for linear mixed-effects model for repeated mesaures design

R – How to Obtain the Variance of Fixed Effects in R

Related Question