Solved – Level-2 predictions with lme4/glmer model

glmmlme4-nlmemixed modelmultilevel-analysis

Let's say I've fitted a 2 level model with glmer like this:

data.model <- glmer(y ~ 1 + level1.var11 + level2.var21 + (1 | ID), family = binomial(link =  "logit"), data = dataset)

where the level-2 grouping is done by ID, level1.var11 is a level-1 predictor, and level2.var21 is a level-2 predictor.

For example, let's say that the level-2 units are schools, and the level-1 units are students in these schools. (I use the notation used by Raudenbush and Byrk in the book Hierarchical Linear Models Second edition.)
Let's say the level-1 predictor is student GPA and the level-2 predictor is SECTOR that is whether the school is public or private. The response variable is 1 if a student repeats a class and 0 if the student does not repeat the class.
The combined model in this case is:

$\eta_{ij} = \gamma_{00} + \gamma_{10}Student\_GPA_{ij} + \gamma_{01}SECTOR_{j} + u_{0j}$

I have fixed intercept, $\gamma_{00}$, and fixed slopes, $\gamma_{10}$ and $\gamma_{01}$, and random effect (the random intercept) for each school, $u_{0j}$.
$\eta_{ij}$ is the log odds for student $i$ in school $j$ to repeat a class.

Using this model, I can predict the probability, $p_{ij}$, for each student repeating a class. (I can decide to use the random effects or not. Lets say I don't want to use the random effects.)

Now I want to know the probability $p_{ij}$ that a student belonging to school $j$ will repeat the class. My idea is to predict the probabilities for each student based on the model I created and then calculate the average probability for each school.

$\overline{p}_{.j} = \frac{\sum_{i = 1}^{n_{j}}p_{ij}}{n_j}$

I am not sure if this is the right approach. Am I missing something important?

I know that I can use the method predict from the package lme4 for prediction at level-1 like this:

predict(data.model, newdata = data, REform = NA, type = "response", allow.new.levels = TRUE)

I wanna know how can I make predictions at level-2 using the model that I fitted with level-1 and level-2 predictors. Should I just average the level-1 prediction for each group or is there a better approach?

Best Answer

I'm not 100% sure I know what you mean by the levels: according to the usual way I've seen this terminology used, level 1 would be "above" level 2, meaning the level of the whole population, so I'm not sure how we can have a "level-1 predictor". Anyway, I'm not sure I need to know, since you can set the fixed effects however you like within newdata. I think the answer to your question is in the help:

ReForm: formula for random effects to condition on. If ‘NULL’, include all random effects; if ‘NA’ or ‘~0’, include no random effects

so ReForm=NA gives population-level predictions (i.e. predictions based on not knowing what ID is being predicted); since you have only a single random effect, using either ReForm=~ID or ReForm=NULL will give predictions conditional on specified IDs. (I see you have set allow.new.levels=TRUE; I'm not sure how that will work with predicting at the ID level ...)

With the development version of lme4:

d <- data.frame(f=factor(rep(LETTERS[1:20],each=30)))
library(lme4)
d$y <- simulate(~1+(1|f),family="gaussian",newdata=d,
    newparams=list(beta=0,theta=1,sigma=0.1),seed=102)[[1]]
m <- lmer(y~1+(1|f),data=d)
newdata <- data.frame(f=factor(LETTERS[1:20]))
predict(m,newdata=newdata,ReForm=NA)  ## all identical
predict(m,newdata=newdata,ReForm=NULL)  ## different by f

(I'm not sure, but the capitalization of ReForm may have changed in the development version -- be careful.)

update: OK, you want to know the average probability of a student at school $j$ repeating a class. I think your approach is reasonable (the answer should be similar to the observed value, although in general it should be a shrinkage estimator [i.e. closer to the overall average). You might also want to consider calculating the probability that an average student at school $j$ would repeat, in which case you would first average the predictors ...

Related Solutions

Solved – How to account for repeated measures in glmer

tl;dr: Your model already accounts for the fact that you have repeated measures. Nonetheless, if it fits, you would do best to use:

glmer(y ~ x1*x2 + (x1:x2|subject), family=binomial)

but if that isn't tractable, you could try:

glmer(y ~ x1*x2 + (1|subject) + (0+x1|subject) + (0+x2|subject), family=binomial)

_{For an explanation of the syntax here, see: R's lmer cheat-sheet.}

Full version: You don't need to "tell" R that $x_1$ and $x_2$ are repeated measures variables. (This is really just a small semantic distinction, but) I wouldn't say that variables can be "repeated measures variables" vs. "non-repeated measures variables". Variables are just variables. I would say that, e.g., 'variable 1 is measured within patients, and variable 2 is measured between patients' or something like that. Of course, your phrasing is fine, you just don't want it to lead to some confusion where you think of repeated measures-ness as some ontological status intrinsic to the variable.

At any rate, instead of telling R that a variable is measured within people, you simply need to formulate a model using random and/or effects fixed to account for the non-independence of the data that come from the same person. (Yes, you can use a fixed effect to account for this: every person would be a level of a categorical variable that is included. However, this will answer a slightly different question—almost certainly not the one you are interested in—and unless you have many measurements on the same person in every combination of conditions, the model will not be tractable.) In practice, you will use random effects to account for this. Specifically, you will have a random effect for each subject.

Next you need to specify what you want random effects for. The syntax you used, (1|subject), will cause R to include a random intercept for each person. This will shift someone's line of best fit up or down relative to the mean. You should think about whether people are also likely to differ in their slopes—i.e., how strongly they respond to changes in your variables. You should also think about whether the random effects are correlated with each other, e.g., maybe people who start off higher when $x_1=0$ tend to also respond more strongly to increases in $x_1$. Common advice is to include all possible random effects and intercorrelations (Barr et al., 2013, "Keep it maximal", pdf). However, bear in mind that GLMMs are more difficult computationally than LMMs, so such a model may not be tractable.

Solved – Adding 2nd level variable into Multi-level Modelling in Stata

I found the solution here:

http://www.iub.edu/~statmath/stat/all/hlm/hlm.pdf

The answer is:

whereas HLM requires two separate data les (one corresponding to each level), SPSS, Stata, SAS, and R rely on only a single file. The level-2 observations are common to each case within the same macro-unit, so that if there are 50 students in one school the corresponding school-level score appears 50 times. Each program also requires an id variable identifying the group membership of each individual.

So, adding a second level variable to the model, like the school sector (Sector), can be easily done via the xtmixed command. For instance:

xtmixed MatGrade Intelligence Sector || SchoolId:

Where "MatGrade" is the outcome variable, "Intelligence" is a level 1 variable reflecting the intelligence of the scholars. Scholars are nested within schools.

Best Answer

Related Solutions

Solved – How to account for repeated measures in glmer

Solved – Adding 2nd level variable into Multi-level Modelling in Stata

Related Question