Solved – Interpreting output from lmer

generalized linear modelinterpretationlme4-nlmer

This probably has been asked many a times, but I cannot find the answer. I'm trying to interpret the output that I get from lmer.
My code is as follows:

Model1<- lmer(DV ~ Eduy_IMP + Gender + Age + APOEdich_IMP + 
     predictor + Time:Factor2:Factor1*predictor + 
     (1 + Time|Study), Data)
summary(Model1, ddf="Kenward-Roger")

DV = Dependant variable; Factor2 has levels 1,2,3; Factor1 has levels 0, 1.

Here is my Output:

My Question: I need to be able to report a coefficient for the predictor in the following groups: Predictor in each of the individual levels of Factor1 and Factor2. How do I get this? I even tried having different data sets and running the above code, but I cannot get a coefficient for these.

My aim is to see the influence of the predictor on the DV in Factor1 and Factor2.

Best Answer

I think in order to get there you would have to include the factors as main effects themselves. Currently, you only use factors in an interaction term. I am not even sure if this is sound (including an interaction without the corresponding main effects), but I am no expert. Nevertheless, this is why you only get coefficients for the interaction or more specifically for the different contrasts.

Model1 <- lmer(DV ~ Eduy_IMP + Gender + Age  + Factor1 + 
            Factor2 + APOEdich_IMP + predictor + 
            Time:Factor2:Factor1*predictor + 
            (1 + Time|Study), Data)
summary(Model1, ddf="Kenward-Roger")

Related Solutions

Solved – Factors missing in summary and Anova output from glm in R

There are three common causes for the zero degrees of freedom:

Saturated model. These are usually spotted from NAs is summary(model))
Factor level combinations with zero observations.These are spotted as zero dfs in Anova(model) for the interaction effects, but the main effect would still have, e.g., one df.
Perfectly collinear explanatory variables. These are spotted from the zero dfs for the interaction effects in presense of the main effects in Anova(model), or if only main effects are included the model, some of the main effect could have zero dfs.

Your case falls under the third cause. Judging from the specification of the models 1 and 2, some of your explanatory variables you have added to model 2 (ethnicity, gender, age, enrollment, Pell, TANF, GPA, onlineExp) are collinear with level and type.

For finding out the problematic variables, consult the answer on Stackoverflow. Briefly, command model.matrix(model) gives out, well, the model matrix for your model. Check the output! Following the advice George Dontas gave on the linked thread, you can then find out which ones are the problematic predictors. Small eigen values from the following command indicate problems:

eigen(cov(model.matrix(model)[,-1]))$values

For a very small R example, try the following:

set.seed(1)
y<-rnorm(9)
x1<-c(0,0,0,1,1,1,2,2,2)
x2<-c(0,1,2,0,1,2,0,1,2)
x3<-c(1,1,1,2,2,2,3,3,3)
fit1<-lm(y~x1+x2)
fit2<-lm(y~x1+x2+x3)
fit3<-lm(y~x1+x2+x3+x1*x3)
library(car)
Anova(fit1)
Anova(fit2) # main effects only
Anova(fit3) # interactions, also
e<-eigen(cov(model.matrix(fit2)[,-1]))$values
names(e)<-colnames(model.matrix(fit2)[,-1]) # x3 creates the problem, the value is very low

Another possibility is to add the new variables one by one, and see when the problem shows itself. You can then try to rectify the situation, e.g., by not adding the problematic variables to the model. Sometimes combining the variables in a suitable way helps, also.

Solved – Interpreting the “coefficient” output of the lm function in R

Let's say we want to predict median value of house expressed in thousands of $ (medv) based on its age, number of rooms (rm) and the crime rate (crim) in its neighborhood. The dataset for this is called Boston and it is in the MASS package

> library(MASS)

Original regression model

> print(coef(lm(medv~crim+rm+age, data=Boston)))

 (Intercept)         crim           rm          age 
-23.60556128  -0.21102311   8.03283820  -0.05224283

So we see that the coefficients for those three predictors are -0.211, 8.03 and -0.05. If we want to calculate the price for any house we just use formula:

medv = -0.21102311*crim + 8.03283820*rm - 0.05224283*age

It means that for example, that for increase in one room the medv price rises for 8.03 thousand dollars.

Now, let's say we measure price in the dollars instead of thousands of dollars. The regression coefficients are totally different.

> Boston.2 <- Boston
> Boston.2$medv <- Boston.2$medv*1000

> print(coef(lm(medv~crim+rm+age, data=Boston.2)))
 (Intercept)         crim           rm          age 
-23605.56128   -211.02311   8032.83820    -52.24283

They are all scaled versions from the ones in the original model, but what if we measure for example, the crime rate per 1000?

Then there is a big change for the rm and age coefficients.

> Boston.3 <- Boston
> Boston.3$crim <- Boston.3$crim/1000

> print(coef(lm(medv~crim+rm+age, data=Boston.3)))
  (Intercept)          crim            rm           age 
 -23.60556128 -211.02311213    8.03283820   -0.05224283

So compare this model and the original one. If we compare the variables based on those coefficients it will seem that the crime rate is much much more important (-211.02 vs only 8.03) than the number of rooms(which is not realistic).

However, in the original model there is quite a different story. Number of rooms seems way more important as we're measuring variables differently.

To asses the relative importance of variables we first need to perform scaling which will make all the variables comparable. It won't matter anymore what unit is used (crim vs crim/1000) for the data, the coefficients will end up the same. We can then compare their values to see which ones are the most important for prediction.

We now apply the same model three times using the differently scaled data sets (Boston, Boston.2, Boston.3) - note the identical coefficient results:

Original regression model

> print(coef(lm(scale(medv)~scale(crim)+scale(rm)+scale(age), data=Boston)))
  (Intercept)   scale(crim)     scale(rm)    scale(age) 
-3.076923e-16 -1.973583e-01  6.136725e-01 -1.598956e-01

Regression model with prices in dollars instead of thousands

> print(coef(lm(scale(medv)~scale(crim)+scale(rm)+scale(age), data=Boston.2)))
  (Intercept)   scale(crim)     scale(rm)    scale(age) 
-3.076923e-16 -1.973583e-01  6.136725e-01 -1.598956e-01

Regression model with crime rate per 1000 people

> print(coef(lm(scale(medv)~scale(crim)+scale(rm)+scale(age), data=Boston.3)))
  (Intercept)   scale(crim)     scale(rm)    scale(age) 
-3.076923e-16 -1.973583e-01  6.136725e-01 -1.598956e-01

From this it seems that number of rooms is indeed more important than the crime rate, which again is a common sense :).

Best Answer

Related Solutions

Solved – Factors missing in summary and Anova output from glm in R

Solved – Interpreting the “coefficient” output of the lm function in R

Related Question