Solved – How to build a linear mixed-effects model in R

lme4-nlmemixed modelr

I asked a question but it was a bit long and confusing so I will attempt to keep this shorter and simple (original post https://stats.stackexchange.com/questions/24971/mixed-effects-model-equations)

I am basically looking to see which factors (e.g. A, B, C such as habitat type, site, disturbance rate) most affect the response value (y). I am finally understanding the concept of models and how to compare AIC values to see which combination of factors best explain y (have greater effect on), but I am new to R so wonder if my basic coding is correct.

Firstly I have all my data in a spreadsheet saved as a .csv file, so I read this file into R. I also open the package lme4.

Then I was thinking of using the following code (although letters would be the headings fo each set of values)

m1<-lmer(y ~ A + (1|E), REML=FALSE)
m2<-lmer(y ~ B + (1|E), REML=FALSE)
m3<-lmer(y ~ A + B + (1|E), REML=FALSE)
m4<-lmer(y ~ A + C + (1|E), REML=FALSE)
m5<-lmer(y ~ B + C + (1|E), REML=FALSE)
m6<-lmer(y ~ A + B + C + (1|E), REML=FALSE)
m4<-lmer(y ~ A + D + (1|E), REML=FALSE)
m5<-lmer(y ~ B + D + (1|E), REML=FALSE)
m6<-lmer(y ~ A + B + D + (1|E), REML=FALSE)
m7<-lmer(y ~ C + (1|E), REML=FALSE)
m8<-lmer(y ~ D + (1|E), REML=FALSE)

My basic questions are:

  1. Let us say that C and D are similar factors, and derived from the same data and I do not think it useful to see if they have additive effects, is it okay not to put them in the same model, i.e. mix and match as see fit, or should all combinations be incorporated?

  2. I chose ML because not all sets of values are mixed and matched. Is this correct or should I use REML?

  3. Some sets of values are non-continuous, e.g. brood number or habitat type (either 1 or 2). Should I be letting R know this. Someone mentioned coding each factor but I have no clue how to do this, or is the csv file enough and then perhaps code to let R know about these particular ones?

  4. Lastly, is there also a way to see p-values to see if the factor has a significant effect rather than just being the best fit to explain y?

I hope that this is more simple than my first post and easier to answer all at once. I really wanted to run this model today, but don't want to have to redo it all finding that I made a simple mistake.

Best Answer

  1. It's entirely up to you as to whether you include both factors in the same model or not. But why not try it and see if you get a significantly better fit with both in than with just one in?

  2. REML works with unbalanced and incomplete designs too. I'd go with REML to reduce the bias in the variance estimates and eliminate the bias in the covariance parameters.

  3. x <- as.factor(z) turns z into a factor. You can of course do DF$x <- as.factor(DF$x).

  4. anova(m1, m3) will test for the significance of the terms left out of the larger model. The models have to be nested for this to work.

Edit: The comments have made me realize my answer was way too terse, so I'm adding some sample code.

The code is not doing the full model that you are doing, it's just to illustrate syntax and what happens with ANOVA:

# Construct sample data; E(y) is a function only of x1
x1 <- c("A","A","A","B","B","C","D","D","D","D")
x2 <- c("A","B","C","A","B","C","A","B","C","A")
y <- rnorm(c(0,0,0,1,1,2,3,3,3,3))  # Values for E(y) match w/ x1

# Construct data frame
df <- data.frame(list(y=y, x1=x1, x2=x2))

# Convert x1, x2 to factors
df$x1 <- as.factor(df$x1)
df$x2 <- as.factor(df$x2)

# Run regressions and perform ANOVA to evaluate effect of factor x2
m1 <- lm(y~x1, data=df)
m2 <- lm(y~x1+x2, data=df)

> anova(m1,m2)
Analysis of Variance Table

Model 1: y ~ x1
Model 2: y ~ x1 + x2
  Res.Df     RSS Df Sum of Sq     F Pr(>F)
1      6 12.9004                          
2      4  5.3241  2    7.5763 2.846 0.1703

The "PR(>F)" column gives the p-value associated with the F-test of whether factor x2 is significant.

Related Question